Undergraduate Texts in Mathematics 
Readings in Mathematics 


A Short Book 
on eng oui 


Q) Springer 


Undergraduate Texts in Mathematics 


Undergraduate Texts in Mathematics 


Readings in Mathematics 


Series Editors 


Pamela Gorkin 
Mathematics, Bucknell University, Lewisburg, PA, USA 


Jessica Sidman 
Department of Mathematics and Statistics, Amherst College, Amherst, MA, USA 


Advisory Board 


Colin Adams, Williams College, Williamstown, MA, USA 

Jayadev S. Athreya, University of Washington, Seattle, WA, USA 

Nathan Kaplan, University of California, Irvine, CA, USA 

Jill Pipher, Brown University, Providence, RI, USA 

Jeremy Tyson, University of Illinois at Urbana-Champaign, Urbana, IL, USA 


Undergraduate Texts in Mathematics are generally aimed at third- and fourth- 
year undergraduate mathematics students at North American universities. These 
texts strive to provide students and teachers with new perspectives and novel 
approaches. The books include motivation that guides the reader to an apprecia- 
tion of interrelations among different aspects of the subject. They feature examples 
that illustrate key concepts as well as exercises that strengthen understanding. 


Fernando Q. Gouvéa 


A Short Book on Long Sums 


Infinite Series for Calculus Students 


G) Springer 


Fernando Q. Gouvéa 
Department of Mathematics 
Colby College 

Waterville, ME, USA 


ISSN 0172-6056 ISSN 2197-5604 (electronic) 
Undergraduate Texts in Mathematics 

ISSN 2945-5839 ISSN 2945-5847 (electronic) 
Readings in Mathematics 

ISBN 978-3-031-37556-9 ISBN 978-3-031-37557-6 (eBook) 


https://doi.org/10.1007/978-3-03 1-37557-6 


© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature 
Switzerland AG 2023 


This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether 
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse 
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and 
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar 
or dissimilar methodology now known or hereafter developed. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

The publisher, the authors, and the editors are safe to assume that the advice and information in this book 
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or 
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any 
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


This Springer imprint is published by the registered company Springer Nature Switzerland AG 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland 


Paper in this product is recyclable. 


Contents 


What This Book Is About ........... 0.0... cece eee ete eens 
1 Getting Close with Lines: 0.00665 cud scares cawds canse endce cannnens 
1.1 Approximation; the general SetUP «60s iccscicses ceeeenaeas ses 
1.2 The horizontal line-approximation s.5..0.ss0esevensevenye vans 
ES PPDICIG ins ees Head edd earns hed whale eden eds 
1.3. Using the tangent line to approximate ......................., 
3.1 Problems .252c.5004555 24458 24458 RECS GRE CREA RES 
1.4 What’s so good about the tangent line? ...................004. 
1.5 Examples of tangent line approximation ...................... 
LSA1 “TRESS TMMCION <tecccidedetacieidzacianesieadaies 
152 Theexponential Menon 22. ccce+ecnchedantecadseds 
L323. “The square root RCO iio4cicoseiacheiasseiasccias 
1.6 Controlling the error in the tangent line approximation .......... 
LO. (Problems: .36ctsscia iad ee dau dager seaeeidesiieks bine 
2 Getting Closer with Polynomials .................... 0. 0c eee eee ee 
21.  Approxiuniations of MEgf66 2 .2.4ccccnscdaweeeeesengeseneneand 
2, - CRORE 222 iheadie ead ee eedieeascuese baebetaneties 
22 Dottig better: the petieral Ca86 c.csccseccrsevey gown eeawbes and 
224  PROUOMS. cistexcg ewes dives taWed ceeds eaeee Reba eed 
22 ‘Taylor polynonmals and derivatives .o25.55s000sseeeeseoeesas 
2a0 PROBIIME: 265.2202 bocbhentiheesieeedeet hobtoce hed 

2.4 How close to the function is the Taylor polynomial 
PT UESICEAN) ab vee iv1G sheep nans deans bet aks eeeet ews 
25 Wirat happens-as 7 et0wse? s c.issdaiss lassi iesaiieeadiasoocas 
2o1 PRORIGME 26506 deeseereeeir en dtevedbevedieredberaas 
2.6 Dots, sigmas, and peneral 16S 22 uss cass caddis cndsacasosens 
2004 PRODIGMS .22.c006 chcwecdenceatnecaracearneaetereet 
2.7 Theeasiest examples: sme and cosine. .......6..556se500020055 
2 “UNS SIMETUMCNION ci eoaeeioseiaodeiarhelerteiatecias 
2.30 TDS COsINe HINCHON: 2ci5 canes cows cacde cacde cade vers 


Vi 


A Short Book on Long Sums 


23.2 The language of convergente ...cs.2.cspccesxcaeewnns 45 
22 oe -PROMIEMG 2225052 te bediehas ieee dabesabaetanernes 46 
2. Theexpomenuadl . 221s .cbctentepex teh edaypedoud cash echt Lock 46 
eo §«PROplemecsciceselieseleosel soeeierelerseietecies 48 
29 “The SeCMeMC SEES sic kes cde sd eee Ree Ree EROS bs 48 
29. PRORICMS: 625 i254 cance inne bb geb lee Siieeebdeseaces 50 
21D The (natural Obani s.24 ia cca ceveniewaviesatieeaaes sax 1 
PAL: The pratnial Senet i. cvrcsendcsageareguanee Sab cesehiga kenge se 
AY POMeme: 222 iccadigead teens toed eaeee sabes beeee nes 54 
DAZ “TWA MONEIS 65k ys ed bbw eH Remade 20204 LEN Th eRTR awed wed 54 
2.12.1 Convergence, but to the wrong answer ............005- 54 
222. Ne convertente at aul io) coonhcecepcasseeeespeeeennes 56 
Ales Benin (We MONSIS: oecresdavesciaehesaeseaanesder af 
213 Sees  KOW ML JOUPSIEED ois ccciseseis ean Gouri eeandeeand 58 
Going All the Way: Convergence ................ 00. c cee eee eee eee 61 
3.1 “DGHNMIONS: 2:24 2240522505 2e4osFesed edad besahieeadieavedys 61 
Selsk PRODIGMS i6scd25e0kts ese br esate eb eeeageeedees ane 64 
3.2 Some basic properties Of convergence ..ccc.ccdss cress casunces 64 
J2- COOMETEENCE AUGHSy DA” cooscdausedeentigadetenspiaccetes 65 
3.4 Convergence: when we can compute the partial sums ........... 66 
3.5. Alternating series: a slightly harder “yes” ..................... 68 
Syl PRODI onc i454 685 oe Kes heRe OORE PERRIER Ee Ted a2 
3.6 The harmonic series: a not-so-easy “no” .............00.0. 00 ee 72 
20) Frepleme icc opseylosersose Ligue larbeiemciasecios ce 
3.7 © Series with positive terms, part | ccccycccapciscecxscsesyevese cus 76 
anil FIGS 222 caenciiescioandieane tease idaeeneade tad 7 
3.8. Séries with positive terms: part 2 soci vcccivcsswwerswvereer ses 80 
Deel PROBING cds cakes Hhind deed neds needy ede da Riead ERs 82 
3.9 One way to handle series whose terms are not all positive ....... 82 
Power Bethes 206 sche see heeehesa sheen uense ee eenephaneeeawhex aes 85 
4.1 <Conversente of power 8eTieS .255 cic) cone dene tiene ceseares 86 
4.2 Examples of finding the radius of convergence ................. 88 


4.2.1 PIGRlCIGe 26a ete e eh antink Aca wedehwedt woes 91 


CONTENTS vii 


a> Power series HEROIN on cohcgeekagdesccenseeeeeteba canes cen 92 
Wo , PROMeMie oc idgadiceed deead beendeaeed baReS SebeS bas 98 

Ze iNew TCHS: igo bc ohhh obese eben Seb ambbbenekt waebbawd 101 
Af | Propet: cc. vsaseiaaseleensndacwotaxheLerseseseeies 103 

4.5 Doing things with power series 1 .......... 0.00. eee eee eee eee 103 
4.6 Doing things with power series2 .................0.000. 00005 108 
46.) FibOnaccl MiMOeie 2434605544244 ee ean eeceeeeekes vans 108 

402 AG€NGCAUING TUNCHONS. cds exude ceeds ewdds eeedeusees ode 109 

5 Distant Mountains ............ 0.0. e eee eee 111 
Sl ACOR ples DUMMIES: csc cg oc ceureceurweaw an eon Rane eam x areh 111 
Soll (PRORIOING. stn Mens conn cas canes wees eeSts ahem a ead 114 

SN NOUS: 4 ochccerinechesecdagtedanicdeeeedneeeneeeaad 115 

5.2 Series in general and the idea of uniformity ................... 115 
Sal — PROBES 4500-0heaees id dhs Cages adhe eebee tahoe wene 118 

Sete INOIES cj stents wes eed als aaa eed meee aerate kaw 119 

5.3 Penodic functions and fourier Series ...660<0cs0sc0ees00seenn> 119 
Deol PIGBICWIe otc ee kh tee eer et okt ee elk pet ode ht awe 126 

Soe MOLES 6cdsiwedchs cde ec auis chur eienhe eae be caneecss 126 

Soe ‘DW Wwe Sates «soc cucck eeeae decane edarauone baa ns comes eM 126 
Netcl, ‘INGUES: 2 i2akd ieaad beens oened beehsbaabe beens beeesnes 127 

A SageMath: A (Very) Short Introduction .......................... 129 
Sul Generel MiG 6s ds ddd de cemeeeenee cages comes HEN 129 
fue Mus exaniples: 441.409 Leeneadens boese boehdaeeedsceceeneanetead 130 
Rae PIGMNE Gee tien atterevEtretieratidi slag niaes eeeieodakwets 134 
Got ARUN nob s eed neha heasstaadabewhsd eokenerhbabepeeae’.chies 137 

B Why T Do 1 THIS Way isciccsc cece se dndesdnaes canes beneeceeerens 139 
Bibliography oo issc.ssec inc isgee danas legen dasae dence dasaedeaee ins 141 
NGOS 3.62 babe bh We hs Whee hs OSH od h ob knee onde bausadebeenes 143 


What This Book Is About 


The goal of this short book is to introduce you to power series, a fundamental tool of 
pure and applied mathematics, and to infinite series in general. There are two major 
themes throughout, which might be described as approximation and representation. 
Throughout we will work with a function f(x) of one variable. The book focuses on 
two questions: 


a. Given enough information about the function at a reference point x = a, is 
it possible to know, at least approximately, what the function does when x is 
near a? 


b. Can we use these approximations to find a representation of the function f(x) 
that is (perhaps) more useful than its original one? 


To illustrate the two problems and what an answer would look like, think about 
the decimal expansion of a fraction. Suppose we have a fraction f = 1/3. Then we 
know that 


f= ; = 0.3333333 ... 


This answers both questions. If we want an approximation to the fraction f, we simply 
truncate the expansion at some point. So, 0.333333 is an approximation to 1/3; 
counting the digits, we can even conclude that the approximation is off by no more 
than 10~ = 0.000001. 

On the other hand, the decimal expansion, taken as an infinite whole, is also a 
representation of the fraction 1/3. The decimal expansion represents a unique real 
number that is exactly our original 1/3, so in a sense it “knows” everything the 
expression 1/3 does. Its form easily reveals things to us (for example, that 1/3 is 
between 3/10 and 35/100) that might have been harder to see from the fraction 
representation. 

The decimal representation is in fact so useful that it sometimes replaces the 
“more exact” fractional representation. This is true for functions too! 

In what follows we will work with a function f(x) instead of a fraction. That 
makes things harder, of course, because a function has many values (one for each 
allowable x). So when we approximate it or represent it, we will have to worry about 
the range of validity: for which values of x is it a good approximation or a valid repre- 
sentation? On the other hand, we can use the tools of calculus (especially deriva- 
tives) to study a function very closely, and that will help us find our approximations/ 
representations. 
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We will focus almost entirely on one specific kind of approximation, the kind 
known as a power series. This is not the only kind that is important; it is simply 
the easiest and oldest kind. Our goal is to understand how to find a power series 
for a given function and how to use the series when we have it. Like the decimal 
representation of a fraction, our power series will usually be infinite. Indeed, one of 
the names for the subject that includes power series is “infinite series,” even though 
most mathematicians use the word “series” by itself to mean the same thing. 

One big contrast between fractions and functions is that every fraction has a 
decimal representation, but not every function has a power series representation. In 
fact, functions that do are some of the nicest and most well-behaved functions. 

Power series are used a lot, by engineers, physicists, and mathematicians of all 
kinds. Anyone interested in applied mathematics needs to know and understand 
them. So do future mathematicians. Real understanding goes beyond the “here’s how 
to do it” approach of many textbooks. I hope this book will get you started in that 
direction. 

We begin with a careful discussion of the first-order approximation to a function 
given by its tangent line. (Of course, this assumes the function has a tangent line—a 
minimal form of “good behavior” for functions.) For some readers, this may be a 
review, but I hope that seeing it again will help prepare you to move to higher-order 
approximations. 

As we go along, I will either prove things or tell you that certain things are true. 
In mathematics, something we have shown to be true is called a theorem. To help 
signpost these critical conclusions I have labeled them as theorems and given them 
numbers. 

Mathematics today is done with the help of technology. This is especially true 
when we want to do computations, including algebraic manipulation. I will use a 
very powerful public-domain program called SageMath. See Appendix A for some 
basic information on how to get it and how to use it. Almost all the computations you 
will need to do can be done online on the SageMath Cell site. Of course, other math- 
ematical software can do these things too, be it Desmos, Geogebra, Mathematica, 
Maple, or MATLAB. If you are used to one of those, I think you will have no problem 
translating my examples to your preferred tool. I like SageMath because it is free, 
very powerful, and can be used online without installing anything on your computer. 

We often learn best, however, by first doing some computations by hand. So we 
will begin that way, and then I will show you how to do those computations with 
SageMath as well. 

As in every mathematics book, there are problems. I have avoided including 
many “just do it” problems. You won’t find 20 almost identical exercises. Instead, I 
have tried to provide problems that are interesting, that require some thought, like 
the problems you are likely to encounter in the real world. 

This book has a narrow focus and does not do many proofs or go into technical 
issues. I have included references to other books that can provide these details if you 
want to see them. A reference like [4] points you to an item in the bibliography, in 
this case Lion Hunting and Other Mathematical Pursuits, by Ralph P. Boas, which 
gave me some interesting ideas I used in this book. 


CHAPTER 0. WHAT THIS BOOK IS ABOUT Xi 


My way of introducing you to infinite series is different from the usual one. 
Professors reading this introduction and wanting to understand why I have done it 
the way I did it should read Appendix B. 

Power series are critically important, but of course there are other kinds of series 
representations for functions. In the last chapter, I have tried to point you towards 
some of them. I hope that you will find learning about series useful and that you will 
be interested in going further. 

Enjoy the ride! 


Check for 
updates 


1 Getting Close with Lines 


Sunset yesterday was at 7:27. What time do you think sunset will be today? 

Of course, that isn’t enough information for an exact answer. We don’t even know 
whether we expect sunset today to be earlier or later. (Is it Spring or Fall?) But we 
can still give a fairly good approximate answer. (Can you see what it is before I tell 
you?) 

The sunset question is a typical approximation problem of the sort we will be 
considering. Here’s how to translate it into a mathematics problem. We have a func- 
tion we want to understand: 


ss(x) = time of sunset on day x. 


(We could have called the function f(x) as usual, but it’s often better to use some- 
thing more memorable; here ss stands for “sunset.””) We know one value. If we decide 
that yesterday was day zero, we know that ss(0) = 7:27. And we want to approximate 
the value of ss(1). 

What can we say? The first guess to make is that sunset today will be at about the 
same time: ss(1) ~ 7:27. That is reasonable because we know that the time-of-sunset 
function doesn’t change too fast: sunset today can’t be too much later or earlier than 
yesterday. As long as we are trying to find an approximation for a value of x that is 
close to our reference value x = 0, we expect there to be little change. Of course, 
what counts as close depends on the problem. For the time of sunset, one or two days 
is close; for a speeding car, we’d want one or two seconds. 

Can we say more? Well, we can if we have a little more information. Suppose we 
know that it is Fall. Then we know that the length of the day is decreasing, so sunset 
is getting a little bit earlier each day. In math talk, we are saying that the derivative 
ss’(x) is negative right now. So we might add to our approximation that 7:27 is likely 
an overestimate: the true value should be a little bit less. In the same way, if we knew 
it was Spring, we would know we have an underestimate. 

Can we do even better? In order to be more precise, we would need to know even 
more. For example, if we knew the actual value of ss’(0) (so that right now sunrise 
is getting earlier by ss’(0) minutes per day), we might be able to make our guess a 
bit more precise, or at least get more information about how far off our guess might 
be. 

Our goal in this chapter and the next is to push this idea as hard as we possibly 
can. We will always have a reference point and we will assume we know a lot of 
information about what is happening there. We will try to use this information to 
come up with a reasonable approximation for what is happening at some nearby 
point. 
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1.1 Approximation: the general setup 


Since we will be discussing approximations, it is good to have some vocabulary 
to talk about them. Let’s create a general setup and give names to various quantities 
we will need to worry about. 


e We are given a function f(x) whose values we want to approximate. In the 
graph, I have shown a function f(x) in blue. 


e We have a reference point x = a for which we know something. In the sunset 
example, we have a = 0 and we know the value ss(0). In the graph, I chose 
a = 0 and we can see that f(0) = 1. In chapter 2, we will sometimes refer to 
a as the center, for reasons that will become clear. 


e We have a target point x for 
which we want to estimate f(x). I 
sometimes use the more colloquial 
“plug-in point.” In the sunset exam- 
ple, the target point was x = 1. In 
the plot, I have chosen x = 0.3 and 
drawn a vertical line in green. 


1.54 


1.07 


ial e An important bit of information is 


the difference between our refer- 
ence point and our target point, x — 
a. We will refer to this as the incre- 
ment. In the sunset example, the 
increment was 1; in the plot, it is 
0.3. It’s pretty clear that this kind of 
approximation will be easier when 
the increment is small. 


0.0 


-0.54 


-1.0+4 : 
-1.0 -0.5 0.0 0.5 1.0 


e Instead of finding f(x) exactly (usually, we can’t do this because we don’t have 
a computable formula for f(x)), we will instead estimate it. That is, we will 
try to come up with some other function est(x) such that f(x) © est(x) when 
x is near a. In the sunset example, we just made est(x) be a constant function 
always equal to f(a). In the plot, I have shown a possible est(x) in red. 


e Whenever we approximate there is an error: the difference between our esti- 
mate/guess and the real value of f(x). So 


Err(x) = f(x) — est(x). 


In the plot, this is the segment on the green line between the blue and red 
graphs. Often, we care more about the absolute value of the error than about 
its sign. 
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In most cases, the hard part is to control the error. That is, we want to somehow 
obtain a formula that says that the absolute value of the error can’t be too big: 


|Errco| < B. 


The number B is called an error bound, because it sets a boundary on the error. 
In our examples, it will always depend on the increment x — a. 


An everyday example of error bounds is the “margin of error” that is often given 
when some survey attempts to measure public opinion. When we are told that the 
margin of error is 3 percentage points, that means that the number reported has an 
error that is no bigger than 3. So when we are told that 95% of people like fountain 
pens, the true value might be as low as 92% or as high as 98%. We want to make sure 
that our estimates come with a margin of error. 

The estimates we will be developing will depend on what we know about the 
function at the reference point x = a and on the increment x — a. The error bounds 
will typically depend on knowing something about what happens between a and x. 
That makes sense: if something dramatic happens as a becomes x, there might be 
a huge change, so a big difference between f(a) (which we know) and f(x) (which 
we are estimating). Huge differences are likely to cause errors, and so lead to large 
values of B. 

What makes an estimate good or bad? It all boils down to the error, which we 
hope is small. But we need to make that a little more precise as well. For one thing, 
the position of the vertical green line matters. In other words, the error depends on 
x, and we should take that into account. 

If our estimate est(x) is any good, it should give f(a) exactly when we plug in a. 
(If we can’t predict the past, we are in trouble!) That is: we expect that est(a) = f(a), 
or, equivalently, that Err(a) = 0. That happens in our graph. 

We have the right to expect a bit more, I guess. As x gets closer to a, we would 
expect (or at least prefer!) our estimate to get better. That happens in our example 
graph: imagine moving the green line closer to the y-axis. 

We can express the behavior we want as a limit: we would like 


lim Err(x) = lim (f(x) — est(x)) = 0. 


In other words, we want our error to go to zero as x approaches a. In fact, we will 
even be concerned to find out how fast the error tends to zero. 


1.2. The horizontal line approximation 


Let’s look back at the “just guess it’s constant” example and translate it into the 
general language we have just learned. We did it for the “time of sunset’ function, 
but let’s now do it for a general function f(x) and a reference point x = a. 


!This idea works well for the kind of approximations we have in mind, but there are other options too. 
For example, we might prefer an approximation that is “good on the average” even if it does not have this 
limit property. But for now we’ll stick to this preference. 
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We know f(a) and we decide to estimate by saying “it won’t change,” so we set 
est(x) = f(a), a constant function whose graph is a horizontal line. So we have 


Err(x) = f(x) — est(x) = f(x) — f(a). 


Clearly, if we plug in a for x, we get zero, so the first condition is true: we get the 
right answer when x = a. 
How about the limit? Well, what we want is 


lim(f(x) - f@) = 0. 
which is the same as 
lim f(x) = f(@). 


That will happen when f(x) is continuous. (It’s actually the definition of continuity.) 
So: 


Theorem 1.2.1. The “just assume it’s constant” approximation is acceptable exactly 
when the function f is continuous at x = a. 


If you think about some discontinuous functions you will see that if we don’t 
assume continuity the situation is hopeless. For example, if we had a function that 
jumped at x = a then knowing f(a) would give little or no information about nearby 
values like f(a + small). 


84 


0.5 1.0 15 2.0 


In the graph, f(1) = 1, but the value of f(1.01) is close to 5. The jump could have 
been of any size, and just knowing what happens when x = | gives us no information 
at all about f(x) when x > 1. 

The fact that continuity is required for the “guess it is constant” estimate to work 
is a hint for the work we will do later: good estimates require good properties of the 
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function f(x). The properties that matter are exactly the properties that calculus is 
all about: being continuous, having a derivative, and so on. In the case of the degree 
zero approximation we are talking about, continuity is what we need, but for better 
approximations, we will need more. 

Can we control the error in our approximation? Well, not if all we know is that 
f(x) is continuous. But if we know it is differentiable, we can. What we need is one 
of the big calculus theorems you learned but perhaps never saw the point of: the 
Mean Value Theorem. Remember what it says: 


Theorem 1.2.2 (Mean Value Theorem). If a function f is continuous and differen- 
tiable between a and x then 


{0-10 « H¢ 


x— 
for some value c between a and x. 


The thing in the numerator is exactly the error we want to understand, Err(x) = 
I(x) — f(a). So let’s clear the denominator to get 


Err(x) = f(x) — f(a) = f'()(x - a). 


That looks like an exact formula for the error, but the problem (as always when we 
use the MVT) is that we don’t know what c is. How can we get around that? 

The key is to find a bound for what f’(c) can be. Suppose we know that the 
absolute value of f’(c) is never more than 5, so | f’(c)| < 5. Then, taking absolute 
values, we see that 


IErr(x)| = [f@) — F@I = If") |x — al < 5|x — al, 


which tells us that the error is never more than five times the increment. That is very 
nice: the closer we get, the smaller the error, and we can actually give a margin of 
error for our estimate. 

In general, if we can show that | f’(c)| < M for all the relevant values of c (that 
is, all values between a and x; we don’t care about any other values). Then we get 


IErr(x)| = |f) — f@I1 =I f(O| |x — al < M|x—al. 


So the mean value theorem tells us that if we can control how big the derivative is 
between x and a, then we can also control how big our error is, and the error bound 
will look like M|x — a|. This makes sense: knowing how fast the function changes 
will allow us to put a limit on how much it can change. 

As an aside, this is the whole point of the Mean Value Theorem! It allows us to 
turn information about how the derivative behaves in a certain range into information 
about how much the function changes. 

Let’s finish with a couple of examples. For a convincing example, we want a 
function we don’t actually know how to compute by hand, since for the ones we can 
compute we don’t need to approximate. Trigonometric functions are like that: we can 
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get them with a calculator or computer, but most of us wouldn’t want to (or know how 
to) compute them by hand. Suppose f(x) = cos(x). Then I know f(0) = cos(0) = 1, 
but the only way I know to compute cos(0.1) is to ask a computer. But let’s try to see 
what our “make believe it’s constant” estimate tells us. 

First a picture. Here is a graph of y = cos(x) (in blue) together with the constant 
function y = | (in red). 


-0.44 


You can see from the picture that the approximation is not bad near the reference 
point x = 0, but gets worse fairly quickly as we move away from that point. 
Our estimate would be that 


cos(0.1) = 1. 
The error, then, will be 
Err(0.1) = cos(0.1) — cos(O) = cos(0.1) — 1, 


which we can see will be negative, because the horizontal line is above the graph. 
Can we control the size of the error? 
The derivative of cos(x) is sin(x), which we know is never bigger than one. So 


If’) = | sin(c)| < 1; 


that is, we can take M = 1. (The reason for choosing cosine as our first example is 
exactly to get an easy bound for the derivative.) Putting that into the MVT estimate 
tells us that the error is at most 1]0.1 — 0] = 0.1. So we have shown that cos(0.1) 
is about equal to 1, with error at most 0.1. We could summarize it all in a single 
formula: 

cos(0.1) = 1+0.1. 


Is that any good? Well, SageMath tells me 
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sage: cos(0.1) 
0.995004165278026 
sage: cos(0.1)-1 
-0.00499583472197418 


So, in fact, this worked very well: our value is off by only a small amount that is 
indeed smaller (a lot smaller!) than the bound we found. 

That was sort of a success. What happens if we do sin(0.1) instead? Let f(x) = 
sin(x). This picture is very different: the red line is the x-axis now. The red is still 
there, but it’s hard to see. 


We would guess from the picture that the error will be a lot bigger this time. 
Let’s see. Since f(0) = sin(O) = 0, our “make believe it’s constant” estimate would 
be sin(0.1) = 0. The error, then, will be sin(0.1) — 0 = sin(0.1).” The derivative of 
sin(x) is cos(x), which, as before, never bigger than one. So | f’(c)| = | cos(c)| < 1; 
putting that into the MVT estimate tells us that the error is at most 1|0.1 — 0| = 0.1, 
just as before. So we have shown that 


sin(0.1) =0+0.1. 
But this time, SageMath tells me that 
sin(0.1) = 0.0998334 166468282, 


which is also the error. So, while our estimate really is within 0.1 of the truth, the 
error is actually pretty big, very close to the maximum possible value 0.1. So we 
might want to do a bit better than this. We’ll talk about how to do that next. 


7It’s a bad sign when your error is the same as the thing you are trying to compute, but let’s plow on. 


8 A Short Book on Long Sums 


1.2.1. Problems 


Problem 1.2.1: I said above that the “make believe it’s constant” approximation 
doesn’t work if the function f(x) has jumps. Does the “time of sunset” function ever 
jump? 


Problem 1.2.2: In the Maine portion of I-95, exits are labeled by mileage, from south 
to north. So exit 130 is three miles further north than exit 127. A car drives by exit 
127 heading north at 70 miles per hour. 


a. Why is it reasonable to estimate that the car will reach exit 130 in 2.6 minutes? 


b. If, in fact, the car arrives there in 2 minutes, what would you conclude? What 
if it took 5 minutes? 


c. What information would allow you to make a better prediction of the arrival 
time at exit 130? 


1.3. Using the tangent line to approximate 


As we pointed out in the previous section, we want a method for approximation 
that goes beyond the “just assume it’s constant” estimate. The idea is to use the trend 
as well as the value at a. The trend is captured by the derivative, so it’s easy to turn 
this idea into mathematics. 

Suppose our function f(x) is differentiable and we know not only the value f(a) 
but also the current rate of change f'(a). Then we can make a slightly better estimate 
by using that information. 

To go back to our example, let’s say that the sunset on August 26 was at 7:27. 
Suppose we also know that on that day the time of sunset was getting smaller (..e., 
earlier) by about 2 minutes per day. Then if we want the time of sunset on August 28 
we just subtract 2 minutes per day: 


7:27 — (2 minutes per day) x (2 days) = 7:23. 


In math talk, that boils down to this: we know f(a), the value at our reference point, 
and we also know f’(a), the rate of change at our reference point. So the right way 
to approximate f(x) is to start with f(a) and then add the expected change as we 
move between a and x. Since the rate of change is f’(a), the result is 


f(x) f(a) + f'(@(x — 4). 


It is really ~ and not =, because the rate of change is f’(a) when x = a, and for this 
estimate to be exactly right we would need to know that the rate is the same for all 
relevant values of x. So the formula really does give only an approximation: it would 
be exact if the rate of change were always the same (i.e., if the graph is a line), but 
that won’t usually be the case. 
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We can phrase this idea in terms of increments: when we increment the input by 
(x—a), we expect the corresponding change in the value of the function to be roughly 
f'(a@(x— a). This amounts, of course, to assuming that the function is behaving like 
a Straight line, with a constant rate of change. That may not be quite true, but we are 
hoping (implicitly) that it isn’t too far from true, i.e., that the rate of change does not 
itself change too quickly. 

Geometrically, the estimate we are making looks like this: we know a point in 
the graph of our function, namely, (a, f(a)). We also know the slope at that point: 
it is f’(a). As you will remember, “slope at a point” means the slope of the tangent 
line at that point. So, we can use the point-slope formula to write down the equation 
of the tangent line: 

y— f(a = f'(a(x - a), 
or, rearranging a bit, 
y= f(a)+ f'(@(x-a). 


That’s exactly our estimate! 

So we can see our estimate easily in a picture: we are saying that the tangent line 
to the graph of a function stays close to the function, at least if we step away only a 
little from the reference point. 

Let’s graph an example. Take f(x) = x? — x (in blue in the picture) and a = —1, 
so f(a) = 0 and f’(a) = 2. Suppose we want to estimate f(—0.8). The equation of 
the tangent line (our estimate, in green in the picture) is 


y—0=2(x—-(-])), 


which just gives y = 2x + 2. 
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Tadded a red vertical line at x = —.8. Our increment is x—a = 0.2, so our estimate 
is f(—0.8) » 0+ 2(0.2) = 0.4, the point where the green tangent line intersects the 
red line. The actual value (the point where the red line intersects the blue curve) is 
0.288. The error is —0.112. Right now we don’t have any way of controlling the error 
without actually computing it. 

Let’s zoom in on the portion of the picture that contains our approximation. The 
approximation is the green line. The actual value is given by the blue curve. The 
error is the line segment on the red line between the intersections with the green and 
blue graphs. 


Here are some things to notice in the picture: 


a. If we look at x = —1 the curve and the tangent line are at the same value. In 
other words, our estimate is exact when x = a, or Err(a) = 0. 


b. If we slide the red line closer to x = —1, the error gets smaller and smaller. In 
other words, lim Err(x) = 0. 
x7a 


c. It is hard to decide from the picture how quickly the error goes to zero. We 
might try computing the error for various values of x to see that. 


d. Our estimate is going to be too big, at least for x near —1. That is because the 
tangent line we drew is above the blue graph. Can you see what feature of the 
blue graph causes that? 


Of course, “noticing in the picture” is not proof that these things will always happen. 
We'll get to that. 
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Let’s see what happens with the example sin(0.1) that we tried before. We know 
sin(O) = 0 and cos(0) = 1, so the tangent line approximation is 


sin(x) ¥ 0+ I(x —0) =x. 


Here is the picture, with the sine curve in blue, the tangent line in green, and the line 


x = 0.1 in red: 


2.0 


1.5 


1.0 


0.5 


0.0 


—0.5 


-1.0 


—1.5 


—2.0 
-2.0 -15 -10 -0.5 0.0 0.5 1.0 1.5 2.0 


One can hardly see the error at this scale: along the red vertical line, the blue and 
green graphs seem to be at the same point. (I tried zooming in by a factor of ten, and 
still couldn’t see a space between the two.) 

Of course, we expect that there is a difference, even if itis very small. And indeed, 
the approximation we get is sin(0.1) ~ 0.1, and the error is about —0.000166, much 
smaller than before. You should check that this picture also shows the three features 
we noted above. 

To summarize: 


Tangent Line Approximation: Suppose that we are given a differentiable func- 
tion f(x) and that, for a certain value a, we know f(a) and f’(a). The tangent line 
approximation to f(x) near a is 


f(x) & f(a) + f'(a)(x — a). 
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If we want to take into account the error, we write instead 
f(x) = f(@+ f'(@& — a) + Err(x), 


where we expect Err(x) to be small when x is close to a. 

The tangent line approximation is sometimes encoded into something called the 
differential of a function. It is tempting to include that here, but let’s resist that temp- 
tation. 


1.3.1 Problems 


Problem 1.3.1: Use a tangent-line approximation to estimate the value of V4.1. 
Compare the result with what you get from your computer. Is your estimate larger or 
smaller than the computer’s value? Can you explain why? 


Problem 1.3.2: Suppose you put a yam in a hot oven, maintained at a constant tem- 
perature of 200°C. As the yam picks up heat from the oven, its temperature rises. 


a. Draw a possible graph of the temperature T of the yam against time f (in min- 
utes) since it is put into the oven. Explain any interesting features of the graph, 
and in particular, explain its concavity. 


b. Suppose that, at t = 30, the temperature T of the yam is 120°C and increasing 
at the rate of 2°C/min. Using this information, estimate the temperature at time 
t= 40. 


c. Suppose in addition you are told that at time t = 60 the temperature of the yam 
is 165°C. Can you improve your estimate of the temperature at time ¢t = 40? 


d. Assuming all the data given so far, estimate the time at which the temperature 
of the yam is 150°C. 


Problem 1.3.3: The acceleration due to gravity, g, is given by 


_GM 
r2 


§ 


where M is the mass of the Earth, r is the distance from the center of the earth, 
and G is a constant (the universal gravitational constant). Show that if we change r 
by a small amount Ar, the change Ag in the acceleration is approximately equal to 
—2gAr/r. Use this approximation to compute the percentage change in g when you 
move from sea level to the top of Pikes Peak, which is 4.315 km above sea level. 
(You'll need to know that the radius of the Earth is 6400 km.) 
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1.4 What’s so good about the tangent line? 


When we use the tangent line approximation, we get something that (we hope) 
gives us a reasonable value for the value of a function near a reference point a: 


f(x) & f(a) + f'(@(x — a). 


This has our two required features: when x = a it is exact, and when x —> a the error 
goes to zero. But here’s the mystery: any line through the point (a, f(a)) does the 
same thing! Let’s go back to the picture from last time and add one more line: 


The orange line is not the tangent line at x = —1, but it is also close to the blue 
curve. In fact, for our target value x = —0.8 it does a better job! So what’s so good 
about the tangent line? 

Looking at the picture very closely, you might (if you have very sharp eyes) 
notice this: if you slide the red line toward x = —1, both lines get closer to the curve, 
but the tangent line does it faster, especially as we get really close. This is very hard 
to see, but luckily mathematics is sharper than our eyes. 

Let’s translate the question. We have a (differentiable) function f(x) and a refer- 
ence point a. The tangent line approximation is 


f(x) & f(a) + f'(@(x — a). 
The potential competitor, a secant line approximation, is 
f(x) ® f(a) + m(x — a), 


where m is the slope of the secant line. (Of course in the picture I chose a particularly 
tempting m!) The two error terms are 


Err(x) = f(x) — f(a) — f’(a(x — a) 
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and 
Err,,(x) = f(x) — f(a) — m(x — a). 


These are both the same formula, really. The first just chooses a special value for the 
constant m. If we plug in x = a, then both formulas just give us zero. And since f(x) 
is continuous it’s clear that when x approaches a both error terms go to zero in the 
limit. 

The question we want to ask is: how fast do they go to zero? What does that even 
mean? 


Detour: How fast does it go to zero? 


Well, how do we measure that? Suppose we take a number h. Clearly, as h goes to 
zero, it goes to zero. (Doh!) What function f(A) goes to zero when h does, but does 
it faster/slower? 

Let’s graph some easy functions of h, namely h, h* and Vn. I asked SageMath 
to do it, like this: 


var (’h’) 
plot ([h,h72,sqrt(h)], (h,0,.5)) 


0.1 0.2 0.3 0.4 0.5 
The graph suggests that h? (green) goes to zero faster than h (blue), while Vh 
(red) does it slower: for small positive values of h we always have h? < h < Vn. 
Notice that this is the opposite of our usual feeling that h? is bigger than A. That is 
true when h > 1, but not near zero! 


One way to measure that is to look at the quotient. Suppose we have two functions 
n(h) and d(h), both of which have limit zero when h — 0. What happens to the limit 


_ nh) 
lim 
h=0 d(h) 


? 
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The numerator n(h) is getting closer to zero, so it wants to make the fraction get 
closer to zero. The denominator d(h), however, is also going to zero, and that tends 
to make the fraction bigger. So the limit is like a race: the numerator wants to make 
the fraction be zero, while the denominator wants to make it infinitely large. Who 
wins? 

Of course, the limit may not exist, but let’s say it does. If the limit is zero, that 
means n(h) has beaten d(h). It approached zero faster. If the limit is infinity, then 
dh) has won the race. If it is some other thing (a number like 3), then we have some 
sort of almost-tie. 

Thinking that way, we can confirm what we saw in the picture: 


h2 
lim — = lim h=0O, 
ho0t h h—=0+ 


but 


So h? approaches zero faster than h, while Vh goes to zero slower than h. It’s easy 
to see that this is true in general: if a > b then A? goes to zero faster than h?. 

In our situation, we will have n(h) = Err(a + h), which goes to zero when h = 
x — a goes to zero, and d(h) = some power of h = |x — a]. As we just saw, higher 
powers go to zero faster than lower powers. Taking the limit, we can decide how fast 
the error approaches zero. 


End detour 


Let’s apply use this idea to figure out how quickly the error goes to zero in our situ- 
ation with f(x), the tangent line, and another line of slope m. 

What serves as A in our case is the difference x — a, which is what is getting 
closer to zero. To say (x — a) > 0 is the same as x — a. So we want to compute the 
limit 

_ Ertimn(*) |. f(x) — f(a) — mx — a) 
lim = lim 


x-a x-a xa x-a 


If we work this out for general m then we can also know what happens when 
m= f'(a). We compute the limit by breaking the fraction into two pieces: 


IX) = Fl) =O $O)= FS) =e) _ JOH I@ 
x-—a ~ x—a x-a x-—a 


The number is fixed, so that part of the limit is easy. If we recognize the first part 
as exactly the thing that appears in the definition of the derivative, it’s easy to take 
Err,,(x) _ ij 


the limit: 
lim a (2 coal 
x-a x-a@ xa yey 


) = f'(a)—™m. 


Notice what that tells us: the limit will be zero exactly when m = f'(a). So for a 
random line, we don’t get zero, which means the error and (x — a) go to zero at about 
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the same rate. But when we use the tangent line with m = f(a), the error goes to 
zero faster than the increment (x — a). In other words, the error in the tangent line 
approximation goes to zero faster than (x — a), and the tangent line is the only line 
with this property. 

The usual slogan is: the tangent line is the best linear approximation to f(x) near 
a. Out of all possible lines, it is the only one that has the property that the error gets 
small faster than x — a does. In fact, looking at the limit we just computed we can 
see that this is pretty much the definition of the derivative f'(a). Let me say it this 
way: 


Theorem 1.4.1. The derivative of f(x) at a is the only value of m for which 
f(x) = f(a) +m(x — a) + Err(x) 
with o 
fig = 6 
xa xX—a 


The idea of a best linear approximation is one of the key ways to understand what 
the derivative of a function of more than one variable should be. 


1.5 Examples of tangent line approximation 


It’s always good to see some examples. Let’s look at three of them. In each case, 
I have chosen a function that we don’t know how to evaluate by hand but our calcu- 
lators and computers do. This allows me to create tables giving both the “correct” 
values (by which I mean the ones my computer gives, to some level of precision) 
and the approximated values. Subtracting, we get the error. We expect the error to 
be small, at least when the plug-in point x is close to the reference point a. In fact, 
we have shown that it goes to zero faster than linearly as x approaches a. 

In order to be able to write down a computable formula for the tangent line 
approximation 


f(a) + fax —a), 
I'll need to choose a so that it is a number I can compute with (so it shouldn’t be 
something like 2/2) and also so that the values f(a) and f'(a) are easy to evaluate. 
As you will see, this severely restricts which a we can use. 

Of course, the theorem is true for any a. The only reason to choose “nice” values 
for a is that we want to be able to compute the approximate value “by hand,” just 
adding and multiplying. My tables were created with Excel. 

I encourage you to choose some other function and make your own tables. 


1.5.1 The sine function 


Let f(x) = sin(x) and so f(x) = cos(x). The good choice of reference point is a = 0, 
since we know that sin(0) = 0 and cos(0) = 1. The tangent line approximation is then 


sin(x) ¥ 0+ I(x —0) =x, 
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sin(x) —x 


x sin(x) sin(x) — x = 

point function error relative error 
1.00 0.8414709848  -0.1585290152 = -0.1585290152 
0.90 0.7833269096 = -0.1166730904 = -0.1296367671 
0.80  0.7173560909 = -0.0826439091 = -0. 1033048864 
0.70 0.6442176872 -0.0557823128  -0.0796890182 
0.60 0.5646424734 -0.0353575266  -0.0589292110 
0.50 0.4794255386  -0.0205744614  -0.0411489228 
0.40 0.3894183423 -0.0105816577 = -0.0264541442 
0.30  0.2955202067 = -0.0044797933 = -0.0149326445 
0.20 0.1986693308  -0.0013306692  -0.0066533460 
0.10 0.0998334166 -0.0001665834  -0.0016658335 
0.00 0.0000000000 0.0000000000 undefined 


Table 1.1: sin(x) is close to x when x is close to 0 


that is, 


sin(x) © x 


when x 0. 


We then expect that the limit of the error divided by x — 0 = x should go to zero, 


and it does: ; ; 

Him SOO — tim (SO - 1) = 1-1 =0. 
x0 x x0 x 
Look at Table 1.1. Notice that the error gets smaller and smaller, but the remarkable 
thing is the last column, which shows that the relative error is also approaching 0. 
Can you see why the errors are all negative? Why didn’t I compute the values and 
errors for negative values of x? 


1.5.2 The exponential function 


If f(x) = e*, then f(x) = e* as well, and the convenient choice of reference point 
is a = O again, since the only value of e* that has a simple exponent and a simple 
answer: e° = 1. Then f(0) = f’(0) = 1, so the tangent line approximation is 


ex 1+l(x-0)=14+x when x ® 0. 


See Table 1.2. Notice that, while the errors and relative errors do approach zero, they 
do it far less enthusiastically than the ones for the sine function. We’ll eventually 
figure out why that is. 


1.5.3. The square root function 


Now let f(x) = Vx. This derivative is a bit less nice: f’(x) = or a = Notice 
x 


that plugging in 0 is not a good idea! The function is not differentiable at a = 0, so 
nothing we have done so far would work. 
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x 
x e~ e~ —(1+x) atte 

point function error relative error 
1.00 2.7182818285 0.7182818285 0.7182818285 
0.95 2.5857096593  0.6357096593 0.6691680624 
0.90 2.4596031112 0.5596031112 0.6217812346 
0.85 2.3396468519 0.4896468519  0.5760551199 
0.80  2.2255409285 0.4255409285  0.5319261606 
0.75 2.1170000166 0.3670000166 0.4893333555 
0.70 2.0137527075 0.3137527075  0.4482181535 
0.65 = 1.9155408290 = 0.2655408290 0.4085243523 
0.60 1.8221188004 0.2221188004  0.3701980007 
0.55 1.7332530179  0.1832530179  0.3331873052 
0.50 1.6487212707 0.1487212707 0.2974425414 
0.45 = 1.5683121855 0.1183121855 0.2629159678 
0.40 1.4918246976 0.0918246976 0.2295617441 
0.35 1.4190675486 0.0690675486  0.1973358531 
0.30 1.3498588076 0.0498588076 0.1661960253 
0.25 1.2840254167 0.0340254167 0.1361016668 
0.20  1.2214027582 0.0214027582 0.1070137908 
0.15 1.1618342427 0.0118342427 0.0788949515 
0.10 1.1051709181 0.0051709181  0.0517091808 
0.05  1.0512710964 0.0012710964 0.0254219275 
0.00  1.0000000000 0.0000000000 undefined 
-0.05  0.9512294245 0.0012294245  -0.0245884900 
-0.10 0.9048374180 0.0048374180 -0.0483741804 
-0.15 0.8607079764 0.0107079764 -0.0713865095 
-0.20 = 0.8187307531  0.0187307531  -0.0936537654 
-0.25  0.7788007831  0.0288007831 -0.1152031323 
-0.30 0.7408182207 0.0408182207 -0.1360607356 
-0.35 0.7046880897 0.0546880897  -0.1562516849 
-0.40 0.6703200460 0.0703200460 = -0.1758001151 
-0.45 0.6376281516 0.0876281516 -0.1947292258 
-0.50 0.6065306597 0.1065306597 = -0.2130613194 
-0.55 0.5769498104 0.1269498104  -0.2308178371 
-0.60 0.5488116361 0.1488116361  -0.2480193935 
-0.65  0.5220457768  0.1720457768  -0.2646858104 
-0.70 0.4965853038 0.1965853038  -0.2808361483 
-0.75 0.4723665527 = 0.2223665527 = -0.2964887370 
-0.80 0.4493289641 0.2493289641 -0.3116612051 
-0.85 0.4274149319  0.2774149319 = -0.3263705082 
-0.90 0.4065696597  0.3065696597 = -0.3406329553 
-0.95 0.3867410235 0.3367410235 -0.3544642352 
-1.00 0.3678794412 0.3678794412 -0.3678794412 


Table 1.2: e* is close to 1 + x when x is close to 0 
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x 


/x f/x —(2 +(x - 4)/4) 


error 


-—4 

point function error relative error 
3.00 = 1.732050808 -0.017949192 0.017949192 
3.10  1.760681686 -0.014318314 0.015909238 
3.20 =: 1.788854382 -0.011145618 0.013932023 
3.30 = 1.816590212 -0.008409788 0.012013982 
3.40  1.843908891 -0.006091 109 0.010151848 
3.50  1.870828693 -0.004171307 0.008342613 
3.60  1.897366596 -0.002633404 0.00658351 

3.70 = 1.923538406 -0.001461594 0.004871979 
3.80  1.949358869 -0.000641131 0.003205655 
3.90  1.974841766 -0.000158234 0.001582342 
4.00 2 0 undefined 

4.10  2.024845673 -0.000154327 -0.001543269 
4.20  2.049390153 -0.000609847 -0.003049234 
4.30  2.073644135 -0.001355865 -0.004519549 
440 2.097617696 -0.002382304 -0.005955759 
4.50  2.121320344 -0.003679656 -0.0073593 13 
4.60 2.144761059 -0.005238941 -0.00873 1568 
4.70  2.167948339 -0.007051661 -0.010073802 
4.80  2.19089023 -0.00910977 -0.011387212 
4.90  2.213594362 -0.011405638 -0.012672931 
5.00 2.236067977 -0.013932023 -0.013932023 


Table 1.3: 4/x is close to 2 + T(x — 4) when x is close to 4 


Since 1/ x appears in both the function and the derivative, what we need is a value 
of a that is easy to work with (say, a positive integer) and whose square root is easy 
to find. So the best choice is a = n* for some positive integer n. 


Let’s choose a = 4. Then f(4) = /4 = 2and f’(4) = : So the tangent line 
approximation is 


Vex 2+ T(0-4) when x ~ 4. 
Another way to write this that I kind of like is 


Varhw2+ 2, 


of course when h x 0. Look at Table 1.3 to see what happens. 
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Do it yourself! (i.e., Problems) 


Problem 1.5.1: Let f(x) = In(1 + x), and let a = 0. (Why is that the right value of 
a?) Find the tangent line approximation near a, and build a table like the ones we 
found above. 


Problem 1.5.2: What other functions are worth doing? 


Problem 1.5.3: Since we know the error goes to zero faster than |x — a], we could 
try comparing it to (x — a)”. For each of the examples above, use a spreadsheet to 
create a table that includes one more column with Err(x)/(x — a)*. What happens? 


1.6 Controlling the error in the tangent line 
approximation 
So here’s where we are. We are given a function f(x) that we want to approximate 


and a reference point x = a. If we know the value f(a) and the derivative f’(a), we 
can approximate f(x) by 


f(x) & fa) + f'(@(x — a) + Err(x), 
so that 
Err(x) = f(x) - (f@ + f'(@e- a). 
We'll keep using the notation Err(x) to avoid having to write it out all the time. 
We know that the error goes to zero faster than linearly, which just means that 


. Err(x) 
lim ——— = 


x-a x-a 


0. 


The goal of this section is to find an actual bound for the error. That will, as usual, 
require us to know a little bit more about the function. 

Remember that we can think of the tangent line approximation as “make believe 
it’s a straight line.” So if f(x) actually were a straight line, there would be no error. 
That’s because, on a straight line, the derivative f’(x) is always the same: it’s the 
slope of the line. So the source of the error is the fact that f’(x) changes as x varies. 
The rate of this change is of course the second derivative f’"(x). So it would be 
reasonable to expect that the bound for the error will depend on the second derivative. 

(Well, at least if we assume that our function has a second derivative. For our 
argument, we’ll assume that it does, and even that f’’(x) is a continuous function.) 

There is one notational quirk we’ ll have to worry about. Since we have a reference 
point a and a target point x, when we need to refer to some other point between the 
two we can’t use either letter. So Pll use ¢ for a generic point, which will always be 
a point between x and a. 
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When we looked at the “pretend it’s constant” approximation, we saw that the 
error could be controlled if we could bound the size of the derivative f’(t) when t 
is between a and x. So it’s reasonable to guess that if we can bound the size of the 
second derivative in the relevant range (between a and x) then we can control the 
error for the tangent line approximation. I’m going to give a proof of this, but let me 
say right up front that it isn’t the world’s most beautiful proof. (As far as I can tell 
there are no beautiful proofs of this, though there are trickier proofs that go faster 
than mine does.) 

For this argument, let’s assume that x is bigger than a. The other case is exactly 
the same, but we want to write down integrals and need to make sure of the order of 
the values. 


t 
O_o —— > 


a X 


Our assumption is going to be that we can limit the size of the second derivative 
f''(@ for all t between a and x. In other words, we assume that we know some number 
M with the property that whenever a < t < x, we have 


—M<f"@<M. 


(Tricky bit: once we have this, notice that it is also true if we replace x by a smaller 
number between a and x: a bound that holds when a < ¢ < x also holds between a 
and t. I'll point out where in the argument we need to use this.) 

Start with the inequality 


—M<f")<M 


and integrate from a to x. The constants are easy: 


[fo Mat=Mo-a) 


and the same for —M. For the middle term, we use the fundamental theorem of 
calculus: when f’’(t) is continuous, we have 


/ f"@ dt = f'(x)- f'(@). 
Putting the results together, we get 
—M(x—a) < f"(x) — f'(@) < M(x - a). 


But remember (the tricky point) we could do the same thing for any other value 
between a and x, so we actually know that whenever a < t < x, we have 


-M(t-a)< f')-f'@< MC-a). 
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Well, it worked once, let’s try one more time: integrate this between a and x. This 
is a bit harder, but not much: 


x mya as 
/ MG ade jus a = F(x =a), 


a 


and similarly for —M. 
For the middle term, remember that f’(a) is just a constant, so it just gets multi- 
plied by x — a. For the rest, use the fundamental theorem once more: 


i (f'() — f'(@) at -/ roar | f'(a)dt = f(x) — f(@ — fax — a). 
Putting together the inequalities, we end up with 


Sox ~ a) < f(x) - f(@— flax —a) < (x ~a)?. 


The thing in the middle is exactly the error in the tangent line approximation, so this 
gives lower and upper bounds for the error. Very nice! 
Let’s write it out as a theorem. We have found out that: 


Theorem 1.6.1. Let f(x) be a function whose first and second derivatives exist for 
all t between a and x. Suppose we know that | f"(t)| < M for any t between a and x. 
Then the absolute value of the error of the tangent line approximation is no bigger 
than R(x — a)’. 


We actually proved this only when a < x, but a similar argument works for the 
case a > x. We also assumed that the second derivative exists and (when we invoked 
the fundamental theorem of calculus) that f’’ (x) is continuous. This last assumption 
isn’t actually necessary, but the proof gets harder if we don’t make it. The only thing 
we really need is that the second derivative exists, so I stated the theorem only with 
that assumption. 

This is the same kind of result we had before: an estimate on how big the next 
derivative can be allows us to estimate the error. The only surprise is that we get 
M /2 rather than just M. But we won’t complain, since M /2 is smaller. 

Let’s try this on one of the examples above. Let f(x) = e*, so that the tangent 
line approximation near 0 is f(x) © 1+ x. Let’s suppose that we are interested in x 
between 0 and 1. To use the error bound, we need to estimate the second derivative 
in that range. 

Now, we know f’(t) = e! (the exponential is very friendly this way... ). How 
big is that between t = 0 and t = 1? Students always find that a hard question, so let 
me spell out one way to figure it out. Notice first that e’ is an increasing function, 
so the biggest possible value for the second derivative e! is e! = e. That’s annoying 
because e is an ugly number. But we don’t need an exact bound for the derivative, 
because we are just going to get an error bound anyway. So remember that e < 3. 
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x ex ex —(1+x) 3x? 
point function error error bound 
1.00 2.7182818285 0.7182818285 — 1.5000000000 
0.90 2.4596031112 0.5596031112 — 1.2150000000 
0.80  2.2255409285 0.4255409285 0.9600000000 
0.70 = 2.0137527075 = 0.3137527075 —0.7350000000 
0.60  1.8221188004  0.2221188004 0.5400000000 
0.50  1.6487212707 0.1487212707 —0.3750000000 
0.40  1.4918246976 0.0918246976 0.2400000000 
0.30  1.3498588076 0.0498588076 0.1350000000 
0.20 = 1.2214027582 0.0214027582 0.0600000000 
0.10 1.1051709181 0.0051709181 —0.0150000000 
0 1 0 0 


Table 1.4: Comparing the error and the error bound for f(x) = e* 


Then we know that if 0 < ¢ < 1 then f’"(t) = e' < e < 3. Taking M = 3 in our 
formula, we get the error bound 


|Err(x)| < So -0) = sx 


The conclusion, then, is that 


Theorem 1.6.2. For any x between 0 and 1, we have e* = 1 +. x with error at most 
332 

This bound confirms what we already know: the error goes to zero faster than x. 
But it also tells us a bit more: the error goes to zero about as fast as x”. 

Table 1.4 is like the ones above, but the column for the relative error has been 
replaced by a column showing the error bound we just obtained. Notice that all the 
errors are (as expected) smaller in absolute value than the bound. In fact, they are 


quite a bit smaller. 


1.6.1. Problems 


Problem 1.6.1: The error estimate for the exponential is much easier for —1 < t < 0. 
Why? 


Problem 1.6.2: Work out the error estimate for the tangent line approximation for 
sin(x) and construct a table like the one we made for e*. Notice that the real error is 
much smaller than predicted. 


Problem 1.6.3: Let f(x) = cos(x), whose tangent line approximation, as we saw, 
is cos(x) & 1 when x is close to 0. Find a bound on the error of the tangent line 
approximation that holds when —1 < x < 1. 
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Problem 1.6.4: Let f(x) = In(1 + x). 
a. Check that the tangent line approximation to f(x) near a = 0 is In(1+ x) & x. 


b. Suppose we want to approximate In(1.5). (Notice that this means making x = 
0.5 in our formula!) Will our estimate be larger or smaller than the true answer? 
Why? 


c. Find a bound for the error of the approximation when x = 0.5. 


Challenge: Here’s a puzzle. For small values of x, the errors in Table 1.4 are smaller 
than the bounds by a factor of approximately 3. So the error is much closer to x*/2 
than to 3x7/2. Why? 


Check for 
updates 


2 Getting Closer with 
Polynomials 


We have found a reasonably good way to approximate the value of a function f(x) 
when x is close to a reference value a. Can we do better? (Well, of course we can; 
why else would I ask the question?) 

So far, we have created the approximations by choosing an approximating func- 
tion that, at the point a agreed, to some extent, with the function we want to approx- 
imate. So, in the constant line approximation, we used the function est(x) = f(a) to 
approximate f(x), and the two functions have the same value at a. 

Then we went to the tangent line approximation. For that one, we chose the 
approximating function est(x) = f(a) + f’(a)(x — a). This has the same value at 
a, i.e., est(a) = f(a), but it also has the same derivative at a: est'(a) = f’(a). 

Well, there are second and higher derivatives. Perhaps we can use them? Of 
course, that means we have to assume that f’"(a), f’’’(a), ... actually exist. So from 
now on, we will assume that our function has as many derivatives as we want. Tech- 
nically, we are assuming that f(x) is infinitely differentiable. 

An astute student might ask, at this point, why we expect this to work. After all, to 
know the derivatives at x = a, we only need to know how the function f(x) behaves 
very near a. The definitions involve limits, which only care about what happens when 
x approaches a. So why should data that is very local to a give information about 
other points, even if they are somewhat close to a? 

The most honest answer is that I just know it works. People have tried it and got 
good results. 

One might argue that knowing the value of several derivatives at x = a tells usa 
lot of information about how the function is changing, and so might allow us to create 
something that agrees with our function at x = a and changes similarly. We could 
also argue in reverse. Suppose we have found an approximation that does match f(x) 
very closely near x = a. Then the approximation and the function should be very 
similar in a neighborhood of a, which implies that their derivatives at a should be 
very similar as well. In other words, if we succeed in finding a good approximation, 
its derivatives should be closely related to the derivatives of the original function. 

Either way, we are hoping for a miracle: that information very specific to the 
point x = a will “spread” to nearby points. The extent to which this happens is one 
of the things we will need to work out. 

So here is the method we will use: to find a good approximation, require that 
as many as possible of the derivatives of the approximation agree with those of the 
function. We’ll start by showing how to do it for degree 2, then move to the general 
case. 
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2.1 Approximations of degree 2 


Since we have more derivatives to work with, let’s try to set something up that 
agrees with value, derivative, and second derivative. Of course, that thing can’t be 
a line any more, because the second derivative of any line is zero. So we will try 
something of degree 2 instead. 

It’s easiest to work with an example first. Let’s put f(x) = Jx and choose a = 4 
as before. We want to create an estimating function near a = 4. We found the tangent 
line approximation earlier: it looks like 2 + T(x — 4). Since we now want something 
of degree 2 let’s try est(x) = 2+ T(x — 4) + c(x — 4)? and try to figure out what c 
should be equal to. 

OK, so 

a=4, f(x) = yx, 


est(x) = 2+ zx — 4) + c(x — 4). 


Notice first that est(4) = 2 = V4 = f (4); that’s the neat thing about writing every- 
thing in terms of powers of (x — 4): they vanish when x = 4. 
Now let’s compute the derivative of our estimating function: 


est!(x) = ; + 2c(x — 4). 


So est’(4) = 1/4. On the other hand, f’(x) = axl? = me so f'(4) = 1/4 as 
2 


x 
well. (No surprise, that’s how we found the slope of the tangent line.) 


The second derivative of est(x) is easy to compute, because it is constant: 
est!’(x) = 2c. 


The second derivative of f(x) is a bit harder, but not much; since f’(x) = sx 7 
Wes Lal aye. 1 aa 
f= oa) x = 1 : 
That’s a beast, but it isn’t bad to plug in x = 4: since 4!/? = 2, we have 47/2 = 8, 
so f"(4) = Se ——. If we want this to agree with est’’(4) = 2c, we have to 


48 32 
choose c = —1/64. 


Oooof. So, it seems that the second-degree approximation to /x near x = 4 
should be 


1 1 2 
w2+—-(x -4)- —(x -4). 
yx 2+ qe -9)= ea 4) 
Try graphing that to see that it does approximate f(x) = /x quite well near x = 4. 


Do we dare attempt to work that out in the general case? 
Yes. We dare. [2] 
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We are given a function f(x) and a reference point a. We assume that we know 
f(a), f'(@ and f(a). We want to use an estimating function of the form 


est(x) = Cg + €4(X — a) + €,(x — a)’. 


The rule for choosing the coefficients is that we want est(x) and f(x) to have the 
same value, derivative, and second derivative at x = a. Let’s write them out: 


est(x) = Cg + cy (x — a) +. €9(x — a) 
est'(x) = cy + 2c)(x — a) 
est’”(x) = 2c, 
Plugging in x = a (see how neat the powers of (x — a) are!), we get 
est(a) = Cp 
est! (a) = Cy 
est’”"(a) = 2c, 


Since those are supposed to agree with f(a), f(a), and f’’(a), we need to choose 


” 
(a) 
= S@) c= f(a) a=? 
So, in general, the degree 2 approximating function to f(x) near a is 
” 
(a) 
f(a) + fax a) + L(x - ay 


2 


These approximating polynomials (of any degree) were considered by the British 
mathematician Brook Taylor, so they are known as Taylor polynomials.! In that lan- 
guage, we have 


The degree two Taylor polynomial for f(x) centered at a is 


f"@) 
2 


f(a) + fax — a) + ay. 

Notice that this degree 2 approximation is a refinement of the tangent line (degree 
1) approximation: the first two terms are the same, and we have added a degree 2 term 
that (we hope!) makes the approximation better. 

Things to think about: 


a. For the tangent line approximation, we had both a theorem about the limit of 
the error as x > a and an error estimate using the second derivative. What do 
you think will be the case for the degree 2 approximation? 


b. What would you do to get a degree 3 approximation? 


The answers will soon come, but think about it on your own for a bit before you read 
on. 


‘Tt is rarely true that the person whose name is attached to a mathematical idea is actually the person 
who first created the idea. Taylor was certainly not the first person to study Taylor polynomials. 
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2.1.1 Problems 


Problem 2.1.1: How good is the approximation 
1 1 2 
w24+—-(x -4)- —(x-4)°? 
Vx 22+ 5 ge) 


Try it! 


Problem 2.1.2: Foreshadowing: following what we did for degree 2, work out a 
degree 4 polynomial that approximates e* near a = 0. (Remember that the goal is to 
find a polynomial so that the value, and the first, second, third, and fourth derivatives 
agree with those of the function when x = 0.) 


2.2 Doing better: the general case 


Let’s do the general case now. We are given: 

e A function f(x) that we assume can be differentiated many times. 
e Areference point a where ALL IS KNOWN. 

e A degree n. 


We want to create a polynomial of degree n that is a good approximation to f(x) 
when x & a. Following the idea, we used before, we will write our polynomial in 
powers of (x — a). There are two reasons to do this. First, it makes plugging in x = a 
very easy. Second, when x is close to a the value of x — a will be small, which means 
that powers (x — a)* get smaller as k gets bigger. 

The polynomials we are going to create are called Taylor Polynomials, so I will 
denote them by T),(x), assuming that the function f(x) and the reference point (often 
called the “center’’) are understood (I really don’t want to write Ty an(*)- Do you?). 
So we want to write 


T,, (x) = cg +¢)(x — a) + 5(x — a)? pot c,(x — a)” 


and now we need to determine what the coefficients should be. The principle we will 
use is that the value and first n derivatives of f(x) and of T,,(x) should agree when 
we plug in x = a. Since we are working with a general function f(x), we can only 
write the derivatives symbolically: 


L@s- FO. FO: ae. GQ: 


But for T,,(x), we can work these out in terms of the coefficients c,. 

The value is easy: T,,(a) = cp, SO we must choose cg = f(a). Notice that this is 
just the “make believe it’s constant” approximation. 

The derivative of T,,(x) is 


Ti(x) = c; + 2cy(x — a) + 303(x — ates ne, (x — ay 
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So when we make x = a, we get T/(a) = c,. Since our goal is to have T/(a) = f’(a), 
we need to choose c, = f’(a). So our polynomial starts with 


f(a) + f'(a@(x-a)t..., 


reproducing the tangent line approximation. 
One more: the second derivative of T,,(x) is 


TI"(x) = 2cy +2 + 303(x — a) +3 + 4e4(x - ayer t+(n— 1)ne,(x — ae. 


Plugging in x = a we get T’"(a) = 2c), and since we want T/’(a) = f(a), we want 
C) = ; f’'(a), just as in the previous section. 


The general pattern is like this: the term c,(x — a)* has derivative ke, (x- ak, 
whose derivative is (k — 1)ke;,(x — a)*-2, and so on. When we differentiate k times 
we end up with 


T(x) =2-3-4-+++(k — Dke, + stuff in terms of (x — a). 


That product (all the integers from 1 to k multiplied together) is denoted k! and called 
“k factorial.” Let’s rewrite using that notation: 


T(x) = k! cy + stuff in terms of (x — a). 


Plugging in x = a makes all the “stuff” be zero, so we get Ta) = k!c,. Since our 
goal is to have Ta) = f(a), we must choose 


1 pq) =O. 


This awful formula is not so hard to understand: it says that the coefficient of (x — ak 
in our polynomial should be the value of the k-th derivative at a divided by k!. 
Will that work when k = 0? It will as long as we decide that 


e the “zero-th derivative” is just the function: f O(a) = f@; 
e the expression (x — a)° just means 1 no matter what the value of x is; 
e the factorial of zero is 1, so 0! = 1. 


Those are just choices to make the notation more convenient. Now let’s grit our teeth 
and write the thing out. 


The Taylor polynomial of degree n centered at a for the function 


f(x) is 


” (n) 
T,(x) = f(a) + f'(a(x — a) + Fo Saye eee PO ay 
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The term of degree k in the sum looks like 


£PO) 
ki 


(x - iy, 


There is a convention to replace the name “Taylor” with the name “Maclaurin” 
when a = 0. Silly as that is, you should know about it, since people will use it. Colin 
Maclaurin was a more important mathematician than Brook Taylor, but there’s really 
no point in assigning his name to a special kind of Taylor polynomials. Maclaurin 
polynomials are just Taylor polynomials centered at zero. The same comment applies 
to “Taylor series” and “Maclaurin series’: the latter just means “centered at zero.” 

Notice that the formula for the term of degree k is the same in all the Taylor 
polynomials whose degree is k or more. For example, the coefficients of the polyno- 
mials of degree n and of degree n + | polynomial will be the same up to degree n; 
for T,,,;, we just add one extra term. That’s good: it lets us think of each new term 
as a correction added to the previous approximation. Or at least we hope so. 

The fact that k! appears in the denominator is potentially very good. As the inte- 
ger k gets bigger, k! grows very fast. A big number in the denominator makes things 
smaller. In the general term 

aC) 
f(x 
k! 


the k! makes things smaller and, when |x —a| < 1, the powers (x —a)* do as well. So 
unless the derivatives f(a) grow very fast, each term is likely to be smaller than 
the previous ones, at least when x is close to a. 

As | already pointed out, if n = 1, we get T;(x) = f(a) + f'(a)(x — a), which is 
the tangent line approximation. So, the Taylor polynomials generalize what we have 
done so far. 

Unfortunately, computing higher derivatives can be pretty hard unless there’s an 
easy pattern. We will later work out ways to find these polynomials without comput- 
ing the higher derivatives. Of course, SageMath is happy to compute derivatives for 
us, so it can help us find Taylor polynomials. What it can’t do, however, is identify 
a pattern that the coefficients follow (even when there is a pattern). 


— a), 


Let’s do an example. Suppose f(x) = — = (1—x)7! and choose a = 0. Let’s 
—x 


work out the values of the derivatives. 
e f(0) =1. 
e f(x) =(-x)%, so f’(0) = 1. 
e f(x) = 2(1 —x)-3, so f"(0) = 2. 
e f(x) =3-20 —x)~4, so f(0) = 6 = 3}. 


e In general, fH(x) = KI — x)~&*), so f(0) = KI. (Can you convince 
yourself?) 
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And now, using the formula above, we see that all the coefficients are one: c, = 


! 
= = | for all k. (The easiest possible pattern!) So the degree n Taylor polynomial 


for 


centered at 0 is 
—x 


T(x) = lt xtxr74 x3 pee txt 


We'll have to wait till a later section to see what we can prove about these polynomi- 
als. It’s worth noticing, however, that in this case the derivatives do grow very fast 
and cancel out the big denominators. 

For now, we can at least compare graphs when n = 10. Let’s ask SageMath to 
do that: 


sage: f(x)=1/(1-x) 
sage: T10(x)=1+x+x72+x73+x74+x74+x76+x77+x78+x79+x710 
sage: plot([f(x) ,T10(x)],(-1,1)) 


The output is this: 


70 4 
60 | 
50 


40 5 


-1.0 -0.5 0.5 1.0 


SageMath always uses the sequence of colors blue, green red, etc., when we plot 
several functions. So blue graph is f(x) = 1/(1 — x); notice that it blows up as x 
approaches 1. The green graph is the Taylor polynomial of degree 10. It does seem to 
give a good approximation near x = 0, but it gets pretty bad as we approach x = 1. It 
stays a bit closer on the negative side, but also drifts away eventually. This confirms 
the basic intuition that our approximations should work reasonably well near the 
reference point x = a, but there is no reason to expect them to work well for values 
of x that are not near to a. 
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Here’s another easy example. Let’s work with f(x) = sin(x) and find some of 
its Taylor polynomials centered at zero. First, consider the derivatives of the sine 
function: 


f(x) = sin(x) 
f'(x) = cos(x) 
f(x) = —sin(x) 
f'"(x) = — cos(x) 
FOG) = sin(x) 
and repeat. 
Since it repeats after four steps, so do the values at x = 0. They are 


0, 1, 0, —1, 0, 1, 0, -1... 


To get the coefficients, we divide these by the factorials, so the coefficients are 


1 1 1 
01, 0, 37? 0, 5’ 0, il 
So the Taylor polynomials look like 
Se are a 
T,Q)ax- a toa t 


(Notice that since the even-numbered coefficients are all zero, we have T,(x) = T7(x) 
and so on.) 


I said SageMath can do it for you. Here’s how. Given a function f(x), we want 
SageMath to compute its Taylor polynomial of degree n centered at a. The command 
for doing that is taylor (f(x) ,x,a,n). In that command, x tells SageMath which 
variable you are working with (SageMath can work with more than one), a should be 
a number (the center), and n should be a positive integer (the degree). Of course, you 
can play around to see what happens if you break these rules: if you tell SageMath 
ais a variable and use it as a center, it seems to work! 

For the example we just did, 


sage: T10(x)=taylor(1/(1-x),x,0,10) 
sage: T10(x) 
x710 + x79 + x78 + x77 + x76 + x75 + x74 + x73 + x72 + x + 1 


That’s what we got as well, but notice that (annoyingly!) SageMath writes the higher 
degree terms first. 

That SageMath can’t do, however, is figure out a pattern for you. In the case of 
1/(1—x), it’s fairly obvious that all the coefficients are equal to 1. But even for sin(x) 
it’s not that easy: you would see only the odd powers, but SageMath writes out the 
values of the factorials, so it is not necessarily easy to recognize the pattern. 

It can be even harder; here’s an example: 
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sage: taylor(tan(x) ,x,0,11) 
1382/155925*x711 + 62/2835*x79 + 17/315*x77 + 2/15*x75 + 1/3*x73 + x 


Aside from the fact that all the even degree terms are zero, I don’t see any pattern in 
the coefficients. Do you? 


2.2.1 Problems 


Problem 2.2.1: Why doesn’t f(x) = x!/? have a Taylor polynomial of degree 3 
centered at a = 0? 


Problem 2.2.2: Rodolfo attempted to use the formula 


f"@ 


2 
Sowa? + 


(x — a)" 


(n) 
f(x) ® f@+ f'(@~«-a)+ oe ue 


to find the fifth-degree Taylor polynomial for f(x) = e* near a = 0. He got 


ex ex ex e 
SACLE RMS te he ee, 


2 6 24 120 


That must be wrong, since it’s not even a polynomial. What did Rodolfo do wrong? 


Problem 2.2.3: We graphed f(x) = 1/(1 — x) and its degree 10 Taylor polynomial 
centered at zero T),(x). Make more graphs to see if you can find out more. What 
happens when x > 1? How about x < —-1? 


Problem 2.2.4: Try making a longer polynomial, say of degree 100, using 
T100 (x) =sum(x7n,n,0,100). 
Compare that to f(x) and to the degree 10 polynomial. 


Challenge: Can you figure out a “closed” formula for the sum 1 +x +x? +++ +x"? 
(A “closed” formula is one that has no dots or summation signs.) We will do this 
soon, but give it a try now. 


Problem 2.2.5: (Nothing to offer here but blood, sweat, tears, and toil.) In each item 
below I give a function f(x), a reference point a, and a degree n. Compute enough 
derivatives and then use the general formula to find the Taylor polynomial of degree 
n for the function f(x) around a. 
(Taking higher derivatives is usually not much fun. Use SageMath if you like!) 
Once you have found all the series, use the taylor command in SageMath to 
check your answers. 


1 
1+x’ 


b. f(x) = cos(x),a=0,n=6. 


a. f(x) = a=0,n=4. 


c. f(x) = In(x),a=1,n=4. 


34 A Short Book on Long Sums 


d. f@) =(1+x),a=0,n=4. 
e. f(x) = a= 2 n= 4. 
x 
f. f(x) = —,,a=0,n=4. 
f@= 5 
Problem 2.2.6: Suppose you know that the Taylor polynomial of degree 10 of a 
function f(x) is 
x (x= 1)" 
ar n! 


What is f ©)(1) equal to? (Remember that f ©)(1) means the value at x = 1 of the 
fifth derivative of f(x).) 


2.3. Taylor polynomials and derivatives 


The way that Taylor polynomials are constructed, by taking many derivatives 
and then plugging in x = a, means that they interact well with taking derivatives. 
The degree k term of a Taylor polynomial is 


(k) 
£r@ Me (x - a)‘. 


If we differentiate that, we get 


k k 
f' (a) k(x — a)! = f' (a) Ges ay}, 

k} (k —1)! 
since k! = k - (k — 1)!. That looks a lot like the term of degree k — 1 of a Taylor 
polynomial, but it has an k-th derivative instead. But (I know this is a brain-twister) 
the k-th derivative of f(x) is actually the (k — 1)-st derivative of f'(x). So by dif- 
ferentiating the degree k term of the Taylor polynomial for f(x) we get exactly the 
degree k — 1 term of the Taylor polynomial for f’(x). 

Since that works for each term, we conclude: 


Theorem 2.3.1. The derivative of the degree n Taylor polynomial for a function f (x) 
is equal to the degree n — 1 Taylor polynomial for f'(x). 


That is very useful to know. For example, the derivative of the degree 3 Taylor 
polynomial for sin(x) is the degree 2 Taylor polynomial for cos(x). So if we know the 
Taylor polynomials for sin(x) we can find the Taylor polynomials for cos(x). And, 
of course, we can also read it in terms of antiderivatives, as long as we remember 
that taking antiderivatives introduces an arbitrary constant. 


Theorem 2.3.2. The antiderivative of the degree n Taylor polynomial for a function 
J (x) is (up to an additive constant) equal to the degree n + | Taylor polynomial for 
the antiderivative of f (x). 


In any specific case, we can find the constant pretty easily by plugging in x = a, 
which makes all the terms vanish except that constant term. 
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2.3.1 Problems 


Problem 2.3.1: Check it: find the degree 5 Taylor polynomial for sin(x) and the 
degree 4 Taylor polynomial for cos(x), then verify that the second is equal to the 
derivative of the first. 


Problem 2.3.2: Check that the derivative of the degree n Taylor polynomial for e* 
is the degree n — 1 Taylor polynomial for e*. 


2.4 How close to the function is the Taylor polynomial 
of degree n? 


Here’s what we know so far. We are given a function f(x), a reference point 
a, and a degree n. With that, we made up a polynomial that we hope gives a good 
approximation. We know the next question we must ask: what can we say about the 
error? 

The Taylor polynomial of degree n centered at a for the function f(x) is 


” (n) 
foe = ay rare 2 f 2 


T(x) = f(a) + f'(a(x -— a) + (x — a)". 
What can we say about this? Does it actually approximate f(x)? 

Our model is the tangent line approximation, which in this notation is T\(x). 
Remember that in that case we had two good properties: 


e We had a limit property: 


i f(x) — Tix) 
im —W——__ = 


xa x-a 


0 


and in fact T,(x) is the only linear function with this property. More precisely, 
if we replace T(x) with any other linear function, this limit will not be zero. 
In other words, T(x) is the best degree 1 approximation to f(x) near a. 


e We had an error estimate: if we know that | f’’(1)| < M for all t between a and 
x, then 


IF) = Tyo] < Fx ~ al? 


Notice that the second property implies the first: something bounded by Mh’, if we 
divide by h, becomes bounded by Mh, which goes to zero. But to get the second 
property we needed to know there was a second derivative, which the first property 
did not require. 

Exactly the same two properties hold in general. (Indeed, the proofs, which I 
won't include, are pretty much the same. A good place to read the proofs, if you 
want to, is [19, chapter 20].) 
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Theorem 2.4.1 (Limit Property) If f(x) is an n-times differentiable function and 
T,,(x) is its degree n Taylor polynomial centered at a, then 
_ f(x) — T,) 
lim ————— = 


xa (x — a)" 


and in fact T,,(x) is the only polynomial of degree n with this property; if we replace 
it with any other polynomial of degree n the limit will not be zero. In other words, 
T,,(x) is the best degree n approximation to f (x) near a. 


For the error estimate, we need to know how big the (n+ 1)-st derivative is. I will 
use M,, for that estimate in order to highlight the fact that the inequality will depend 
on the value of n. 


Theorem 2.4.2 (Error Estimate) Suppose f (x) is an (n+1)-times differentiable func- 
tion and T,(x) is its degree n Taylor polynomial centered at a. If we know that the 
(n + 1)st derivative satisfies 

FPO SM, 


for all t between a and x, then 


[x— al, 


If(x) -— T,Q)| < 


(n+ 1)! 


As before, the second property is stronger than the first, but it needs one more 
derivative. 

It’s worth noticing that the error term is almost the same as the degree n+ 1 term 
of the next Taylor polynomial. The degree n + 1 term would be 


f(a) 


Garr ay. 


In the error term we replace that (m + 1)-st derivative at a with our bound for the 
(n + 1)-st derivative between a and x. This confirms the intuition that each extra 
term is a correction. 

The upshot is that these polynomials provide a good way to find approximate 
values of f(x), at least if x is near the center point a. Of course, in the limit property, 
we don’t know how close it needs to be. The error estimate can tell us that more 
precisely, at least if we can find a good bound M,,. 

One problem here is that the bound M,, depends on the (n + 1)-st derivative, 
and so it can change when n changes. We’d like the approximation to get better as n 
grows, but we can’t guarantee that in general: it’s always possible that M,, is huge, 
after all, and grows even faster than (n+ 1)!. 

We should note that Theorem 2.4.2 is only one way to bound the error when 
we use the degree n Taylor polynomial to approximate a function. (You can find 
two others in [19, ch. 20].) The dependence on finding a bound for the value of the 
(n+ 1)-st derivative makes it hard to use. There is no guarantee that it will work well 
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in any particular case. It sometimes fails. But, as the next example shows, for some 
especially nice functions it all works. 
Let’s go back to f(x) = sin(x). We found that the Taylor polynomials look like 


(Notice that since the even-numbered coefficients are all zero, we have T,(x) = T7(x) 
and so on.) Now we can estimate the error too. Since all the derivatives are either 
+ sin(x) or +cos(x), we have | f"*)(t)| < 1 for any t. So, we can take M,, = | for 
every n to get 
| xe! 

(n+ 1)! 
Since our estimate for the derivative is true for any f, this error bound is true for 
all values of x. Of course, powers of x are only small if |x| < 1, but maybe those 
factorials in the denominator make up for that. 

We noticed above that for the sine function we have T),,, ;(x) = T>,,45(x), since 
the extra term in the latter is zero. So perhaps we should use the error estimate for 
T>n42(x), which has a bigger factorial in the denominator. In other words, 


sin(x) — T,,(x)} < 


[2c|2#+3 


[sin@x) = Tayi ()| = (2n+3)!" 


This explains the special behavior of the tangent line approximation to sine function 
that you noticed in Problems 1.5.3 and 1.6.2. If we think of T(x) = x, we get 


. 1, 2 
—~x|<— 
|sincx) x| < 5 ll ? 
but if we use T>(x) = x, we get 
|sin(x) — x| < digi? 
6 


When |x| < 1, as it was in our tables, the second estimate is much smaller. Essentially 
the same thing works for the cosine, with “odd” and “even” switched around. 


2.5 What happens as 7 grows? 


Our hope has been that when the degree of the approximating polynomial gets 
bigger, the approximation will improve. So far, we don’t know that this is true. In 
fact, it will not be true for all functions. 

Let’s continue pushing on the example we just computed and do some plotting, 
asking SageMath to plot the sine function and its Taylor polynomials of degree 1, 3, 
and 5. The commands would be 
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sage: f(x)=sin(x) 

sage: T1(x)=taylor(sin(x) ,x,0,1) 

sage: T3(x)=taylor(sin(x) ,x,0,3) 

sage: T5(x)=taylor(sin(x) ,x,0,5) 

sage: plot ([f(x) ,T1(x) ,T3(x) ,T5(x)], (x,-5,5) , ymin=-1.5, ymax=1.5) 


We get 


The sine function is the dark blue curve, T; (x) is green, T(x) is red, and T;(x) is light 
blue. Notice that we are actually seeing two things: better and better approximation 
for a given x and a growing range of validity. 

From the first point of view, x is fixed and n is changing: if we look at a small 
number x, we see that T;(x) gives a better approximation that T(x), though they 
are all so good that near zero we can’t see much difference in the graph. But the 
difference is there: 


sage: T1(0.1) 

0. 100000000000000 
sage: T3(0.1) 

0 .0998333333333333 
sage: T5(0.1) 
0.0998334166666667 
sage: sin(0.1) 
0.0998334166468282 


Notice that T; (0.1) is a bit too big, 73(0.1) a little too small, then T5(0.1) is too big 
again... but the last agrees with sin(0.1) to ten decimal places. (Can you see why 
the “too big’’/“too small” alternation is happening?) This is what we expected: for x 
near zero, we are getting better and better approximations. 
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The second has to do with varying x: T(x) seems to stay near sin(x) for more 
values of x than T,(x), and T;(x) does it longer than T;(x). This is a surprise: as we 
have mentioned before, there is no reason to expect our approximations to work for 
large x. Nevertheless, it seems that they do. What happens if you try degree 7 or 9? 

We can frame two questions, then. First, the standard approximation question: 
for a fixed value of x, is it true that the error | f (x) —T,,(x)| goes to zero as n grows? 
This amounts to asking whether longer polynomials give better approximations. Our 
hope is that the answer is yes, at least when x is close to the center a. Our error 
estimate will sometimes allow us to answer this question. 

The second question is this. Suppose we know that the error goes to zero for 
some value of x; we can now consider all the values of x for which this happens. 
What does this set of values of x look like? For example, could it be that the sine 
series we just found converges for some x and not for others? 

We can try to use the error bound to answer both questions. For example, if 
f(x) = sin(x), we know that 


; 1 
sin(x) — T,,(x)| < GED! |x|"! 
For some values of x, it’s easy to see that this is very small, and gets smaller as n 
grows. Suppose, for example, that |x| < 1. Then |x|* < 1 for all exponents k. So the 
error is less than 1/(n + 1)!, which is a very small number that goes to zero as n gets 
big. 


2.5.1 Problems 


Problem 2.5.1: What about the other values of x? After all, when x is big a power 
x" gets very big too. We’ll check soon that the factorials beat the powers, so that 
indeed it works for every x. Can you convince yourself without my help? 


Problem 2.5.2: Use the degree 2 Taylor polynomial to estimate cos(0.1). Find a 
bound for the error in your approximation. Compare your bound with the error 
according to SageMath. 


Problem 2.5.3: During an encounter with an extraterrestrial, it is revealed to you 
that the answer to life, the universe, and everything is f(1) for some function f. The 
extraterrestrial, when pressed, refuses to tell you what f(x) is, but it does let you 
know that 


e f(0) = 26, f’(0) = 22, f’(0) = —16, and f’”"(0) = 12, 
e For any x such that |x| < 3, we have | f)(x)| < 7x*. 


Find upper and lower bounds for the value of f(1). 
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2.6 Dots, sigmas, and general terms 


So far, we have mostly been writing sums using dots to indicate terms we do not 
write down. For example, 


L+xtx7 4x3 $$ xb 


That is fine in examples like this one because the pattern is fairly clear. But it’s more 
honest to tell the reader what the pattern is, so I really should write 


Lt xtx7 trope tx tine t xb 


The x” term is called a general term. It gives a recipe for the terms: take n = 0 to get 
the 1, then n = | to get x, then n = 2, etc. That way no one can say “I don’t see the 
pattern.” 

Once we have a general term, we can also use summation notation, writing 


15 
Dixt alex? gx te $x'. 
n=0 


The big sigma & stands for “sum.” At the bottom, we give the first value of n to 
consider, and at the top, we indicate the last value of to use. To translate into a 
sum, plug in each value in turn and stick + signs between them. 

If we want to go on forever (and we soon will), we will put co at the top instead: 


wo 
Dxt alex tx tx ter txt. 
n=0 


Summation notation takes much less space than dots and for some computations it 
is actually easier to use, because you can work directly with the general term. 

Alas, summation notation is also easier to get confused with, so I'll continue to 
prefer writing things out with dots, but you should practice translating back and forth 
between the two versions. 

For now, here is the formula for the Taylor polynomial of degree n written out in 
summation notation: 


n ¢tk) 
7,2) = Y OOo - at. 


Here, we have used our definitions (see page 29) for how to understand f (a), k!, 
and (x — a) when k = 0. 

Notice that since I wanted to use n for the stopping point I need to use a different 
counting variable for the general term. It doesn’t matter what we call that variable, 
because its only job is to be replaced by the numbers 0, 1, 2,.... 
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2.6.1 Problems 


Problem 2.6.1: Unpack each of these expressions by writing out the sums: 


(5 - DG -2)...G-2-D 


5 1 
ai pe aaa ae 
n! 


x2ntl 


_4yyntl 
d. YICD (2n+ 1)! 


n=0 


6 


nix" 
e. xe) i 


n=1 


2.7 The easiest examples: sine and cosine 


There are several functions whose Taylor polynomials are easy to compute, 
either directly by finding derivatives or by using other tricks. Furthermore, in some 
cases, we can estimate the error so well that we can conclude the approximation gets 
better and better (or not) as the degree n grows. Let’s work through some of them. 


2.7.1 The sine function 


We have already worked out that 


x2k+1 


(2k + 1)! 


3 5 7 
Loss = pe = 


k 
che Tie Tanai 


where 2k + | is the largest odd number smaller than n. We worked out that the error 
is controlled by 

| ee iia 
(n+ 1)! 


We can actually do a little bit better by remembering that in this case 75,4) (x) = 
T>n42(x), since the coefficients of even powers of x are always zero. So we can say 


| sin(x) — T,(x)| < 


|x|2"+3 


| sin(x) — T4100) S (2n +3)!" 
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As you may know, factorials grow much faster than powers. Let’s prove it. The 
key point is to notice that 


Jal dd «Dod Lad beled dad xd bad bx] Lt 


kL 16253-4122 84k 
As we go along, the number x is fixed, but the denominators keep growing, so we 
are multiplying by ever-smaller fractions. 

To estimate the product, we focus on when the fractions get smaller than some 
fixed number; say 1/2. Let N be the smallest integer bigger than 2|x|. Then if i > N 
the fraction |x| /i is less than 1/2. Assuming we have a long enough product so that 
N <k, wecan split the product into two chunks: the fractions with i < N and those 
withi > N. 


[xl flat lad bd lx dx Ix] |x| |x| 

k} 123 4°" N N+1N+2°° k }- 
Since N is fixed, the first chunk is just some fixed number. That number is to be 
multiplied by the second chunk, which is the product of a whole bunch of fractions 


(well, k—N fractions), each of which is less than 1/2. So the product is smaller than 
that: 


k k-N 
as < (first chunk) (5) . 


The first factor is fixed, but the second factor goes to 0 as k > oo: multiplying by 
1/2 over and over makes the product arbitrarily small. 
The upshot is that no matter what x is 


|x| 


rl —> 0 asx > oO. 


So no matter how big x is, the fraction |x|/k! can be made to be as close to zero as 
we like by taking k large enough. 
Applying this to the error bound for the Taylor polynomial, we see that, no matter 
what x is, the error 
|sin(x) e T,(x)| 


can be made as small as desired by taking n large enough. That is, 
lim [sin(x) = T,(2)| =i 
naw 


This is true for any x. Of course, for larger x one needs to let n become quite big to 
see the error become small. 

We've seen this in graphs, but let’s do it again even more carefully. I'll ask Sage- 
Math to plot the function and some of the polynomials 


sage: P1i=plot(sin(x), (-10,10) ,color=’blue’ ) 
sage: P2=plot(sin(x) ,taylor(x,0,5),(-10,10) , ymin=-2, ymax=2,color=’red’) 
sage: show(P1+P2) 
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I get this: 


So T;(x) gives a good approximation when —z/2 < x < 2/2 but it starts being bad 
when |x| > 2/2. At this scale, we can hardly tell the difference between the two 
graphs when x is close to 0. 

Now do the same replacing the 5 by 7: 


As you can see, the degree 7 approximation is closer than the degree 5 approxi- 
mation, both in the sense that it gets closer when x is small and in the sense that 
“small” includes a bigger range of choices for x. This seems to match the sine fairly 
well when —7 <x <a. 
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Let’s go the whole hog and replace 7 with 25: 


2.04 


1.5 4 


-1.5 4 


—2.0 4 


In this range, one can hardly tell the difference. It takes quite a bit of zooming in to 
see that the graphs are not quite identical. 


2.7.2 The cosine function 


Everything we did works just as well for the cosine, except that the Taylor polyno- 
mials now only have even powers of x. We can find them by taking the derivative of 
the polynomials for the sine. Notice that 


2k 
“ (2) = (pe 2et Dx = eye 
: Qk+D! Qk +! (2k)! 


where in the last step we used the fact that (2k + 1)! differs from (2k)! by a factor of 
(2k + 1), which gets canceled out. 
So, we see that the n-th Taylor polynomial for cos(x) centered at zero is 
x2 x4 


x6 k x2k 
T.6)e 12) eS Fee HIP 
nl) ata at tO) Ge 


where 2k is the largest even number less than n. Just as before, the error estimate is 
|x |2"+2 
cos(x) — T>,(x)| < ————., 
( ) onl ) (Qn re 2)! 


which goes to zero when n — co no matter what x is. I'll let you play the SageMath 
games with plotting. 
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2.7.3 The language of convergence 


In each of these cases, we have Taylor polynomials T,,(x) that approximate the func- 

tion f(x) arbitrarily well as we make the sum longer. The word used for this is con- 

vergence. We will give general definitions later, but let’s say it all in this case. 
Think about the infinite sum 


x3 x5 x7 x2ntl 


x- t-te 4 (-1)"—_ +... 
3! SS! 7! CD (2n+ 1)! 
where the dots at the end indicate we go on forever.” To avoid saying “infinite sum” 
the standard term is “series” or (redundantly) “infinite series.” The k-th partial sum 
is what you get by stopping when n = k, that is 


3 x 7 x2k+1 


: x x k 
k-th partial sum = x — — + — — — += + (—1)"———_.. 

’ 3° st 7 Gea! 
The difference between the k-th partial sum and the sine function is what we have 
been calling “the error.” Since the error goes to zero for any value of x, these partial 
sums always get closer and closer to sin(x) as k — oo We say the infinite series 


converges to sin(x) for every x and write that as an equality: 


3 5 7 2n+1 
ne a a a ay ef 
sin(x) = x 31 + 31 71 +--+(-1) Qn+D! +... for all x. 
Similarly, we have 
2 4 6 2n 
a ea 
cos(x) = 1 TT + a 6l +--+(-1) (Qn! +... for all x. 


These two series should be thought of as different ways of writing the function, 
just as 0.3333 ... is another way to write 1/3. The series give a way of approximating 
the values of the sine and the cosine (just take a long enough chunk), but they also 
give us a new way of saying what the sine and cosine are: sin(x) is whatever number 
the partial sums converge to. In other words, saying 


3 5 7 n+l 
sin(x) =x-—+2—- > +--+(-1)"——— +... for all x 
! ! ! n ! 


is a way of saying what sin(x) means. And it has some advantages over other ways, 
the most obvious one being that we can actually compute sin(x) to any desired pre- 
cision using it. 

One should stop for a moment and contemplate how amazing this is. The infor- 
mation that went into constructing the series for sine and cosine was obtained by 
looking very near x = 0. All we used, after all, were the values of the derivatives 
at 0. Yet somehow that is enough to recover sine and cosine for every value of x. 
In other words, if we know a small bit of the graph near x = 0, the entire graph is 
determined! 


Of course, we can’t literally go on forever, just as we can’t write infinitely many 3s in the decimal 
expansion of 1/3. But, as in that case, we know how to go on as long as necessary. We’ll need to define 
what we mean when we talk of infinite sums and convergence. We give the definition informally in this 
section, then return to it in the next chapter. 
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2.7.4 Problems 


Problem 2.7.1: To construct a table of sines, the ancient astronomer Ptolemy used 
an approximation method to compute the sine of one degree, which is the same as 
z/180 & 0.017453 radians. Use a Taylor polynomial to compute the sine to five 
decimal places. 


Problem 2.7.2: We proved that for any value of x the limit of x*/k! as k goes to 
infinity is zero. So if we want |x* /k}| < 1077 for some fixed x, it is possible to 
choose k large enough to make that happen. Suppose x = 10. How large does k need 
to be? (You can either use our proof to get an estimate or use a computer and brute 
force.) 


2.8 The exponential 


The second easiest example is the exponential function f(x) = e*, again with 
a = 0. The best part is that in this case we know all the derivatives: f (x) = e* 
for every k, and so f(0) = 1 for every k. So, the coefficient of x* in the Taylor 
polynomial is just 1/k!. 


Theorem 2.8.1. The Taylor polynomial of degree n centered at a = 0 for the function 
S(x) = e* is 
1 Pe x 
Tage ages ae 

To see whether these polynomials approach e*, we’ll have to look at the error 
term, which involves finding a bound for the (m + 1)-st derivative. The (nm + 1)-st 
derivative is of course e* again, but now we don’t just want to plug in zero, but 
rather to consider e’ for all t between 0 and x. What we are looking for is a number 
that we can guarantee is larger than the largest possible value of e’ in that range. The 
key is to remember that e! is an increasing function, so its highest value happens 
when we are furthest to the right. 


Case 1: x < 0. 


If x < 0 and t is between 0 and x then t < 0 too, soe’ < 1.So we canuse M,, = 1 
as our error bound on the (n + 1)st derivative and we get 


| jet 
(n+ 1)! 


le* — T,(x)| < 


As before, factorials beat powers, so we see that the error will converge to zero as 
n > co and we get arbitrarily good approximations as n gets larger. 
Case 2: x > 0. 


If x > 0, then we want to consider e’ for 0 < t < x, and so we have 1 < e! < e*. 
We don’t know what e* is equal to, but it isn’t infinite, so there is some number M 
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that is bigger (for example, we could take M,, = M = 3”, where m is some integer 
larger than x). And since all the derivatives are the same, this same M,, = M works 
for every n. (This is key. If we couldn’t use the same M,, for every n, the argument 
that follows would fail.) So, we get 


[x|2t! 


le* —T,,(x)| < M i 


If M is big, this will take a bit longer to converge to zero, but a fixed M times 
something that goes to zero is still going to zero. So, in the limit, the error goes to 
zero, and so we get arbitrarily good approximations as the degree n grows. 

In the language we introduced before, what we have proved is that the series for 
the exponential always converges to e*: 


eee pet a. for all x. 
2° 3) n! 

When we computed approximations and errors for e* above, I asked you to check 
that the process was easier for x < 0. That’s what we just saw: when x < 0, we can 
take M,, = 1 and get an easy bound. When x > 0 it is harder to come up with a 
bound, and in fact, for each x, the bound M is a different number (which luckily 
does not depend on n). 


Let’s try to use the series and our knowledge of how the error behaves to compute 
e itself to five decimal places. We know 


poe ie ee oe 
oe 23h nto 


The question is how far we have to go, and for that we need a bound on the error. 
n+1 
Al". 1f0<t <1 we 


We found a formula above: the error is bounded by M,, ee!” 
know e! < e < 3, so we take M,, = 3 for every n. Since we want x = 1, the factor 


|x|"+! is just equal to 1. So, our error bound is 


1 


le-T,(1)| < 3a =r 


We want five correct decimal places, so let’s make sure the error is small enough 
for that. If we make sure the error is less than 10~°, we will know for sure that the 
five decimal places are correct. So now we need to choose n so that will happen. For 
example, if n = 4, that’s not enough, because our error bound would be 3a = = 
which is far too big. So we experiment: make n bigger until the error becomes small 


enough. Since I’m lazy, I defined a function and just plugged in some values of n. 


sage: err(n)=3.0/factorial (n+1) 
sage: err(7) 
0.0000744047619047619 

sage: err(8) 
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8.26719576719577e-6 
sage: err(9) 
8.26719576719577e-7 


The e-5 notation means “times 10>.” So the smallest n that makes the error 
small enough is n = 9. That means we should estimate e by adding up to degree 9: 
1 1 1 1 1 1 1 1 
mtratatat at at gt opm 21828152557... 
(Of course I didn’t add that up by hand.) How close is that to e? The difference, 
according to SageMath, is 3.0289 x 107’, so we have done better than we expected. 


exlt+ 


2.8.1 Problems 


Problem 2.8.1: Use a Taylor polynomial for f(x) = e* near a = 0 to approximate 
1/ ve to three decimal places. (The key here is to decide which n you need to use.) 


2.9 The geometric series 


This example is known as the geometric series. It is very famous for two reasons. 
First, it occurs in a lot of practical applications. Second, we can actually compute the 
exact error and check when it becomes small. (This is a miracle: usually, the exact 
error is not accessible.) 

: : 1 : : 

We start with the function f(x) = ——. We worked out its Taylor polynomial 


of degree n centered at 0 in section 2.2. We got 
T, (x) = 1 ee ae oe eee eee ae 
To compute the error, we have to work out the difference 


1 


Toy (bt tei th te txt), 


We could use the error estimate, but this is just algebra, so we can avoid doing the 
hard work of computing and estimating derivatives. Let’s just do the algebra. 


To subtract Fi 


1-x 
I need to have a common denominator, which will have to be | — x itself. That will 
look like this: 


—(L4¢ xx? +3442"), 


1 (l— x) +x+ x7 +23 ++ +x") 
1-x 1-x , 
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Let’s work out the numerator of the second fraction: 


(l—x)\dtxtx74 x9 4-4 x") 
= 11 txt x7 txr per tx —x(Ltxtx2¢ x3 pee tx") 
=1Lt xt x2 exper txt (etx? grb gpxt gen txt 


=] — xt 


(That’s an amazing amount of cancellation!) So the error is 


1 Z 3 Z 1 1— x" n+1 
—— -(ltx4tx° tx tetx") = -——_ = 
1-x ( aes ee x") 1-x 1-x 1-x 


So we know the error exactly for every x and every n. 

How big is the error? Well, the denominator is fixed once we know x, and the 
numerator is just x to a big power. If |x| > 1 that tells us that the error is Jarge. For 
example, if we take x = 2, then the error is —2”*!, which gets larger and larger (in 
absolute value) as n grows. But if |x| < 1, then we are good: powers of x get very 
small, and larger n means smaller error. 


To summarize: if f(x) = —, then 
—x 


T(x) =1L4+x+x7 4-42" 


and we have 
[x|"*! 


|1— x] 


If |x| < 1, the error goes to 0 as n > oo. If |x| > 1, the absolute value of the error 
goes to infinity as n > oo. 

It’s always good to remember that when A = B then also B = A. In some 
practical applications what we actually want to compute is the finite sum of powers 
of x. So let’s rearrange our formula and record what we have proved about that. 


If) — T,@0)| = 


n+l _ yntl 
Ltxtxete-tx%= Bayi me sa 
1-x 1-x 1-x 
This is true for every x ¥ 1. (It’s just algebra.) 
The upshot is that we have proved that 
= =1LExtxr per txt. if |x| <1. 
—x 


In that equation, the equals sign has a technical meaning: it means that, for those 
values of x, the series converges to 1/(1—x). If |x| > 1, the series does not converge. 
We say then that the series diverges. 

This is very different from the other examples. For sine, cosine, and exponential, 
we ended up with a series that represented the function for all values of x. That did 
not happen here. Instead, we get a series that matches the function in a certain range, 


50 A Short Book on Long Sums 


when —1 < x < 1, and does not converge at all for other values of x. This makes 
the equals sign even more tricky than before: not only does “equals” really mean 
“converges to” but also comes with the caveat “but only for these values of x.” 

In books, one often finds this series in a slightly different form in which we 
multiply both sides by some constant a. We can write the finite geometric sum as 


2 “ a—axt! 
atax + ax” ++ +ax9 = —— 
1-x 
and the geometric series as 
2 n = a : 
a+ax+ax° ++++ax ea Tee if |x| <1. 
—x 


Some books will use r instead of x, mainly for historical reasons. 

We’ll go back to the geometric series and why it is useful later on. For now, let 
me just comment that it is pretty much the only case in which we can compute the 
error explicitly, without needing to approximate at all. 


2.9.1 Problems 


Challenge: If we take x = 1/2 our formulas say 


Can you find a way to draw a picture that shows this? 
Problem 2.9.1: Of course, we don’t expect 


aD ei eee iat 
1-x 


to work when x = 1, because f(1) is undefined. But we could take x = —1, and then 
f(-1) = 1/2. Check that in that case the error is always equal to +5, so that it does 
not go to zero as n grows, 1.e., the series does not converge. 


Problem 2.9.2: When we write : = 0.33333... it may not really be clear what the 
right-hand side means. 


a. Explain how to interpret the infinite decimal as a geometric series, and check 
that the equation is indeed true. 


b. What is 0.999999 ... equal to? 


Problem 2.9.3: Suppose we are rolling a six-sided die. We know that each of the 
numbers 1, 2, 3, 4, 5, 6 occurs with the same probability, 1/6. 


a. What is the probability P(1) that we will get a six in the first roll? 
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b. What is the probability P(2) that we will not get a six in the first roll, then get 
one on the second roll? 


c. Fix an integer n > 0. What is the probability P(n) that we first time we get a 
six is on the nth roll? 


d. It is possible that we will get a six only after a very large number of rolls, so 
P(n) is never zero. Show, however, that lim P(n) = 0. 
no 


e. The sum of the probabilities of all possible events should be 1. Check that this 
is true for our situation. 


Problem 2.9.4: (A bit of mathematical folklore.) Two cyclists, 120 miles apart, 
approach each other, each pedaling at 10 miles per hour. A fly starts at one cyclist 
and flies back and forth between the cyclists at 15 miles per hour. When the cyclists 
come together (and squash the fly between them), how far has the fly flown? 


Problem 2.9.5: For which values of x do these series converge? 


a. a) b. Yona" 


Challenge: We avoided using our error estimate for Taylor polynomials in this sec- 
tion. What happens if we try to do it that way? 


2.10 The (natural) logarithm 


The logarithm is one of the most useful functions we have, but we have no direct 
way to compute it. In that way, it is much like the exponential, sine, and cosine. In this 
section, we will find a series for it and (without proof) its range of convergence. Nat- 
urally, we will work with In(x), which is in every way the “right” logarithm function. 
Notice that SageMath uses log for the natural logarithm (so do all mathematicians, 
except in calculus class). 

One issue we have to deal with is that we can’t plug in zero: In(0) is undefined. 
We could instead use a = 1, but we prefer to use series centered at 0, so instead we 
will work with the function f(x) = In(1 + x). 

The key thing to remember is that, if f(x) = In(1 + x), then f’(x) = 1/(1 + x). 
But we already know a series for that! If we change x to —x in the geometric series, 
we get 


1 


= 1s $x? x7 $94 (1x? +... if |x| < 1. 
1+x 


Since Taylor polynomials play nice with derivatives (i.e., using Theorem 2.3.2), we 
can just integrate this to get a candidate series for the logarithm: 


x3 x n+1 


2 

x x 
In(1 =C+x-2 +2 -S4--4(-lD"— +... 
ala) CRA = ge et 


KR 
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Here C is the arbitrary constant that we always end up with when we take an an- 
tiderivative, but we can easily figure out what the right choice of constant is: if we 
plug in x = 0 on both sides, we get 0 = C + 0, so we need C = 0. So the series for 
In(1 + x) should be 


x2 x3 x xntl 
Inl+x)=x-~—+—-—+4+-+-4+(-1)"——+.... 
a ame eae are 


Ng 


This is the right series; how about convergence? Well, we could try to use the error 
estimate and check, but instead, I'll mention a theorem that settles the issue: inte- 
grating a Taylor series preserves convergence (for more detail, see section 4.3). So 
we conclude 


2 3 4 ntl 

x x x x : 
Ind = Becca ener! Cerra 2 an [pi Bids f <i. 
n(l+x)=x 5 3 4 ( — i if |x| 


Challenge: “We could try to use the error estimate,” he said, but would we succeed? 


2.11 The binomial series 


Suppose p (for “power’) is a real number and f(x) = (1 + x)’. For this to make 
sense for general powers, | + x needs to be positive, so we need to assume x > —1. 

Why would we want a series for this function? Well, consider some special values 
for p: 


e Ifp= i, we have f(x) = V1 +x, which is a function we know and love. 


e If p =nisa positive integer, we get (1 + x)”, which is a polynomial of degree 
n. Its Taylor series will be the polynomial itself. 


e Ifp = —1 we get f(x) = 1/(1+ x), and we already know that one: it’s basically 
a geometric series. But if p = —2 we get 1/(1 + x)’, for which we don’t yet 
have a series. So taking p to be a negative integer can be interesting. 


e Another common choice is p = -i, which is relevant because we often have 
to deal with things like 1/714 x. 


e When we deal with circles, we often run into expressions involving V1 — x?. 
We can get a series for this by starting from a binomial series, as we will see 
later. 


Usually, then, p will either be an integer of a fraction. 
To find the Taylor polynomials, we use brute force and compute many deriva- 
tives: 


a. First we plug in to find the degree 0 term: f(0) = (1 + 0)? = 1. 
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b. Next we need the derivative: f’/(x) = p(1+x)?"!, so f’(0) = p. 
c. Now the second derivative: f(x) = p(p— I)(1+x)’~2, so f’"(0) = p(p— 1). 


d. Each time we take another derivative, we get another factor in front and the 
power goes down. The factors start at p and then get smaller by integers: p— 1, 
p-—2, and so on. After k steps it looks like 


f(x) = p(p— Dip- 2) (p-(k-— Dd + xP *, 


(make sure you check that). So plugging x = 0 into the k-th derivative gives 


f©O) = p(p— 1)(p — 2) (p- (kK - 1). 


That’s a little like a (descending) factorial, except that p doesn’t need to be a 
whole number. 


To get the coefficients of the Taylor polynomials we now divide by the factorials. So 
the coefficient of x* term will look like 


P(p— Wp- 2): @- (k— I) 
k! , 


That ugly mess is sometimes abbreviated using the notation 


D\ _ p(p-1)(p-2)- (p—-(k- 1) 
k} k! : 


This is called a generalized binomial coefficient. If you have met binomial coeffi- 
cients before, check that when p is a positive integer this gives the usual answer. 
An example, for sanity’s sake. Say p = 1/2 and k = 5. Then 


a = 1/2-d/2-1)-C/2-2)-d/2 -3)-d/2-4) 


5 5! 
D223 2) = (9/2172) 
i 120 
es ee 
~ 25.120 256° 


Using that notation, the Taylor polynomial of degree n for (1 + x)? is 
-1 
Poa eg PY 
2! n 
Estimating the error (1 + x)? — T,,(x) is pretty difficult in this case, so instead I'll 


just tell you the easy part of the answer: if |x| < 1 the error goes to zero as n goes to 
infinity, so we get convergence. In other words, 


=] 
(l+x)P=1 +px+ POD 4 mee (?)x" ffl <1, 
a n 
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where you should remember that 


(esau Eee) 


n n! 


As the example with p = 1/2 suggests, it can be quite difficult to work out a general 
term without the binomial notation, but given a specific p you should definitely try 
to do it. 


2.11.1 Problems 


Problem 2.11.1: Just to practice, find the Taylor series for 1 + x. (This is p = 1/2, 
of course. I told you that you should try!) Try to work out the general term; if that is 
difficult, at least find the first few terms. 


Problem 2.11.2: Suppose p is a positive integer. Then (1 + x)? should just be a 
polynomial of degree p. For example, 


(x + 1)4 = x4 44x39 4 6x? 44x 41. 
Check that the formula for the binomial series does what it should. 


Problem 2.11.3: We have two ways to compute the Taylor series for the function 


Pree | 
1 aa are 


We can take the geometric series and replace x with —x, or we can use the binomial 
series with p = —1. Check that both methods give the same answer. 


2.12 Two monsters 


All these pleasant examples might give you the impression that our method 
always works. Alas, this is not true. Here are two cases where things go badly wrong 
in two different ways. In both cases, I won’t try to work out all the details. The goal 
instead is to alert you to the fact that bad things can and do happen. 

Luckily, it takes a little bit of work to find really bad examples. Or maybe it’s not 
luck, but rather that the reason we know and love the functions we know and love 
is that things do work out well for them. Our “everyday functions” are made out of 
polynomials, roots, exponential and logarithm, sine and cosine, all of which have 
Taylor series that behave nicely. 


2.12.1 Convergence, but to the wrong answer 


The first example is the rather simple-looking function f(x) = e7!/ x” We want to 


study its behavior near x = 0. 
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Of course, as written we can’t put x = 0 into the formula, since 1/0 is not defined. 
But as x > 0 the exponent —1/x? goes to —oo, and 


lim e“ =0. 


k->-0o 


That says that the value eo t/x? goes to 0 as x goes to 0. So we define f(0) = 0. If 


you like, the function we are really working with is 


ent /x? ifx #0 
foo= 45 ifx =0 


The graph of f(x) looks like this: 


-2.0 -1.5 -1.0 -0.5 0.5 1.0 1.5 2.0 


As you can see, the graph is very flat near the origin. It’s straightforward (if 
you are good at using L’Hospital’s rule) but annoying (even if you’re good at using 
L’Hospital’s rule) to check that our function is infinitely differentiable at x = 0. In 
fact, 

f'(0) =0, f”(0) =0. f’"(0) = 0, ... f™(0) =0,... 


All the higher derivatives are 0 at x = 0. In other words, the differential calculus 
can’t distinguish what this function does at x = 0 from what the constant function 
does. Of course, when x ¥ 0 the derivatives are not zero since f(x) is not actually 
constant. 

What does that mean for the Taylor polynomials? Well, the coefficients are given 
by f™(0)/n!, so they are all zero. So 


T, (x) =0+0x+ Ox? + 0x? foe + Ox” 
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is the zero polynomial and the Taylor series is the zero series. So the error is 
f(x) — T(x) = f(x) — 0 = f(x). 


The error doesn’t depend on 7, so it doesn’t get smaller as n grows. 

Of course, it is still true that f(x) ~ O when x is near enough to zero and our 
estimates are all still correct, but the polynomials don’t say much more than “the 
graph is really very flat.” 

The annoying part is this: adding lots of zeros just gives zero, so the T,,(x) “con- 
verge” as n — oo, but they converge to the zero function g(x) = 0, not to our starting 
function f(x). So in this case we get a convergent series, but it converges to the 
wrong thing! 

So this example shows that for some functions the Taylor polynomials get closer 
and closer to something, but that something is not the original function. And once 


-1/x? 


we have one example we have many: consider, for example, sin(x) + e and its 


Taylor series. 


2.12.2 No convergence at all 


This situation is at the other extreme. It can happen that the Taylor series does not 
converge at any point except the center. 
Here’s an example. The function is this: 


f= / * ot cos(t*x) dt. 
0 


Since the thing we are integrating depends on x, the answer is a function of x. It 
seems hard to work with a function that is defined as the result of computing an 
integral, but it’s actually straightforward. (Another reason to take a course in real 
analysis!) 

The graph of that function looks like this: 


1.0 


3Taken from [5, Section 24]. 
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It turns very quickly at the maximum point, but it’s still a smooth curve. 
When we compute the derivatives at x = 0, we get this. The odd-numbered 
derivatives are all 0, while the even-numbered derivatives are 


f°? O) = (-1)"n)! 
So the coefficient of x” in the Taylor polynomials is +(4n)! /(2n)!. Here is the degree 
14 polynomial: 
T,4(x) = 1 — 12x” + 1680x* — 665280x° + 518918400x° — 670442572800x!° 
+ 1295295050649600x!* — 3497296636753920000x!4 


The coefficients grow very fast, since (4m)! is enormously bigger than (2n)!. If x 4 0, 
it certainly doesn’t seem as if these polynomials will converge to anything. In fact, 
plotting the degree 14 polynomial (and showing only —1 < y < 1, since otherwise 
we'd only see a vertical line) gives this: 


0.5 1.0 


The value at x = 0 will always be 1, but as the degree of the polynomial goes up 
the curve gets narrower and narrower. It also flips from positive to negative: the 
polynomial for n = 16 would go upwards from | instead. 

It’s not hard to show that for any fixed x # 0 the absolute value of the sum gets 
bigger and bigger as we make the degree larger, so the polynomials are not giving us 
better and better approximations. In other words, there are Taylor polynomials, but 
if x 4 0 they do not converge to anything at all as n > co 


2.12.3 Banning the monsters 


Monsters of both kinds are scary and in a technical sense plentiful.* But in everyday 
mathematics, we run into them only rarely. Itis good to know they are there, however, 
so that we can be ready for them if we are unlucky. 


4 good place to look for the technical information on this, including what I mean by “plentiful,” is 
[5, Section 24]. 
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From now on, however, we will focus? on what happens when we do have con- 
vergence, at least for some values of x. Functions that are not monsters, so that their 
Taylor series exist and converge to them at least for some values of x, are called 
analytic. What our monsters show is that there exist functions that are infinitely dif- 
ferentiable but are not analytic, either because the Taylor series doesn’t converge at 
all or because it converges to the wrong thing. 


2.13 Series to know in your sleep 


To conclude this chapter, let’s collect our positive results. We have found several 
cases where the Taylor polynomials do give better and better approximations for our 
starting function, at least for a certain range of values of x. The basic ones are so 
important that I like to call them the series you should know in your sleep. Should 
someone get into your dorm and shake you in the middle of the night, demanding 
you tell them the series for the sine, you should be able to answer! Knowing a series 
includes knowing the range in which it is valid, so I have included that each time. 


Exponential 
2 3 n 
PS let te ty for all x. 
Sine 
3 5 7 2n+1 
i i SO 
sin(x) = x 31 + 51 71 +--+(-1) Qn+D! +... for all x. 
Cosine 
2 4 6 2n 
ape a eae 
cos(x) = 1 TT + a 6l +--+(-1) (Qn) +... for all x. 
Geometric 
= =1L4x4x7 et x"4 if |x| <1. 
—xX 
Logarithm 
2 3 4 ntl 
x x x x . 
Indl +x)=x-2>+—-—+4--4(-1)" Pky f |x| <1. 
BL oe a a ae ers itl 


>We are “banning the monsters,” to use language due to Imre Lakatos [16]. 
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Binomial 


(L +x)? = 1+ px + aaa pe P+. + (?)x x". 


where 


n n! 


eee ee) 


if |x| <1, 


® 


Check for 
updates 


3 Going All the Way: 
Convergence 


We have just found several examples of infinitely long sums which we can plausibly 
say are equal to something. Or course, there is some kind of limit involved, since we 
cannot actually add infinitely many numbers. In this chapter, we will see the formal 
definitions and look for criteria that let us decide when convergence happens and 
when it doesn’t. That will allow us to explore, in the next chapter, the convergence 
behavior of power series. 

To see how naturally infinite sums appear, consider this very old example. Imag- 
ine a bar two feet long. We can cut it in half. The left piece will be one unit long: 


2=1+1. 
Now split the second piece in half to get a piece of size one half: 
1 1 
2=l1+=-+- 
2 2 
Keep doing it... 
2=1+1 
1 1 
=l+=+- 
2 2 
=o poe 
4 4 
iis Be tae Bg oe tee a 
2°45 8 Qn 2h 
for every n. Does it make sense to do it infinitely many times and end up with 
1 1 1 1 
1S a Se eae he ae EDP 
te gt gage 


Well, not if we actually think of adding infinitely many things; we don’t have that 
much time. But yes if we are willing to define the equality in terms of increasingly 
close approximations. Indeed, this equality is what we get when we take x = 1/2 in 
the geometric series. 


3.1 Definitions 


So far we have only looked at the convergence of Taylor series, for which there 
is the complicating factor of the variable x. Sometimes a Taylor series converges for 
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some values of x and diverges for others. For the general theory, let’s just assume 
that we have something that looks like the sum of infinitely many numbers. (That is 
the situation once we have plugged a number in for x, after all.) 

To help understand the formal definitions, let’s use as a running example the sum 
we just wrote down: 


peg ge Ee a op Be ed ay 
2 4 8 2n = 
This section is mostly about giving a precise meaning to that equals sign. 
A series is an infinite sum 


ag tay +a, +a,+-++a,+... 


The numbers being summed, aj, a5, etc., are called the terms of the series. So in our 
geometric example, the terms are a, = 1/2”. A formula for the terms that is valid 
for all n is often called the general term. 

The numbering of the terms is arbitrary. We could, for example, have called the 
first term b, instead. But then the general term will change: b, = 1/2"—!. We could 
even be perverse and call the first term c,7 if we wanted to. 

Given a series, we look at the partial sums that we get by adding up to a point: 


Sy = day tay tant + ay. 


In general it will be hard to work out what exactly the partial sums are equal to, but, 
as we saw above, geometric series are miraculous. So, in our geometric example, we 
know that 


1 1 1 15 
S,=1l4+-4+-4+-=— 
a ae ae ae” 
and even dasa ; , ; 
S,=1 — — = oe — = 2a 
Cala ig ho Pas ae 


either by the computation we just did or by using our formula for the sum of a finite 
geometric sum. 

For non-geometric series we won’t be able to compute S,, and we certainly won't 
be able to compute an error term. So we will need to come up with ways to figure 
out what the .S, do even when we cannot work them out exactly. 

One caution: we have to take the terms in order when making partial sums. So, 
for example, if we add only some even-numbered terms as in dy + ay + ay then this 
is not a partial sum in the mathematical sense, even though it’s a sum of part of the 
terms. Our partial sums always include the entire initial chunk of the series. 

Now we consider what happens to the partial sums as they get longer, i.e., as 
k — oo. If they converge to a limit, we say that the series converges and that the 
limit is the sum of the series. So if 


lim S,=S 


k->00 
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then we say the series converges to S' and write 
ag tay +a,+ 4, +°++a,+°> =<S° 


In our geometric example, we know that S; = 2 — a which converges to 2. 

The limits that appear here are just like the ones you have met before. What they 
mean is that the difference between the partial sums .S;,, and the limit .S can be made 
as small as I want by choosing k large enough. So when I claim the limit is S what 
I am really claiming is that, no matter what your error tolerance is, I can satisfy it 
by making k large enough, i.e., by proving something like “if you choose k bigger 
than..., then....” 

To see what can go wrong, look first at 


1+14+141+4..., 


that is, a series with terms a,, = | for every n. In this case S;, just keeps getting bigger 
as k gets bigger, and doesn’t converge to anything. The series diverges. (Since we do 
know the partial sums get bigger and bigger, people sometimes say it “diverges to 
infinity.’”’) 

Another example is to have a,, = (—1)”. Then the series is 


1-14+1-1+1-1+... 


The partial sums are all equal either to 1 or to 0, so again they don’t have a limit as 
k — oo. Suppose, for example, we try S = 1/2. Then the difference between S, 
and S' is always either -1/2 (if S, = 0) or 1/2 (if S, = 1). No matter how large I 
make k that distance is always 1/2. It will be that way for any potential limit S', so 
this series diverges too, oscillating instead of going to infinity. 

These are all very simple examples, but things can and do get more subtle. For 
example, look at this one: 

ee See. 
2 Bo 4 n 
If you don’t already know the answer, it is fairly difficult to decide whether this 
converges or not. 

Of course the Taylor series for sine, cosine, and exponential converge to the sine, 
cosine, and exponential for any input x, so they provide many examples of convergent 
series: just choose a value for x. More interesting, perhaps, is the geometric series. 
For example, take x = 1/10 to get 

1 1 1 1 10 


l+—t+— +t = =—. 
10 102 103 je, 2G 


Now multiply by =. We get 
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that is, 
0.99999... = 1. 


As you see, there really is a link between decimal expansions and series. 


3.1.1 Problems 


Problem 3.1.1: Does the series 
1 1 1.1 21 ~=1~«1 


pee a ate el a. 
2 og gs a a Se 


converge or diverge? If it converges, what is the sum? 


Problem 3.1.2: Does the series 


ree rege Sareea ae! 
2.6." n(n + 1) 


Hess 


converge or diverge? If it converges, what is the sum? 


3.2 Some basic properties of convergence 


We defined convergence using partial sums and taking a limit, so the usual prop- 
erties of limits can be used. Suppose, for example, that the series 


ag + ay + da) + HE we 
converges. Then we can conclude that the series 
Say + 5a, +5ay +++ +5a, +... 


also converges, since the partial sums of the second series are always equal to five 
times the partial sums of the first. Since 

lim 5S, =5 lim S,, 

noo nao 


we can even conclude that the sum of the second series is five times the sum of the 
first. Scaling does not affect convergence, though of course it changes what the sum 
is. 

Something similar works for adding. Suppose we have two convergent series 


Agta, tants +a,+t-=A 


and 
bop + by t+bg 4+ +b, 4+ -- = B, 


where of course the equal signs mean “converges to.” What can we say about the 
series whose terms are the sums a, + b,,? Well, if we look at partial sums then 


(ap + bo) + (a, + by) Sa (a, + by) = (ap + ay tet a;) + (bo + by Sea by), 


Going All the Way: Convergence 65 


since these are just finite sums. Now remember that the limit of a sum is just the sum 
of the limits (as long as the limits exist). So we see that 


(ap + bo) + (ay + by) + (ay + by) + (a, + b,) fee = A + B. 


Of course this only works when both series converge. 
Here’s a more interesting observation. Suppose we have a series 


ag ta, tant +a,+... 


and we somehow know that if we lop off the first two terms the resulting thing con- 
verges: 
agate ta,+=S. 


The partial sums of the complete series are 
agp +a, +a, te +a, = (dp +a;) + (€> ++ +a), 


So we are just adding a fixed number (ap + a) to each partial sum. So the sum of the 
whole series is 


Agta, tants +4, ++ =ag+a, +8. 


The slogan for this is “convergence depends only on the tail.” If we remove the first 
two (or, for that matter, the first one million) terms, we change the limit but we don’t 
change the convergence behavior. 


3.3. Convergence: an easy “no” 


There is an easy condition that lets us say for sure that a series does not converge. 
Here it is. 
Suppose you have a series 


ag tay +a, +a,+--+a,+... 


that does converge to a limit .S. Then when k is large enough the partial sums S, are 
very close to S'. So we have 


S — (dg + ay + ay + 43 + ++ + ay_) + a,) = small 
and indeed the small difference goes to zero as k increases. We also have 
S'— (dg + a, +a) +43 4+ ++ +a,_1,) = small. 
Two things that are both close to S will be close to each other, so 
(ag +ay tag te +ay_1 + ax) — (ao +A, +d, +e + ay_1) = small. 


(If I wanted to be annoyingly precise I would say that the difference is at most 2 times 
small; do you see why?) But the difference between those sums is exactly a,. So we 
see that: 
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Theorem 3.3.1. [f the partial sums get arbitrarily close to S then the terms a, get 
arbitrarily close to zero. 


Or, more formally, 


Theorem 3.3.2. If a series 
ag tay +a, +a3,+°--+a,+... 


converges then 
lim a, = 0. 


no 


This gives us a negative criterion: 


Theorem 3.3.3. /f lim a, does not exist or exists and is not equal to zero, then the 
naw 


seri€s ag + a; + a) +++ +a, +... does not converge. 


We have already seen two examples of this: both 
1l+1t+1]+--4+1+4... 


and 
1-14+1-14+1-14+--+(-1)"+... 


diverge, since in the first case the limit of the terms is 1 and in the second case the 
limit of the terms does not exist. 

But be careful! That tells us nothing at all about what happens when the limit of 
the terms is zero. In fact, as we will show, in that case the series might or might not 
converge. Just understanding the limit of the terms isn’t enough to settle the issue. 

A small warning: in this situation (and in others soon to come up) there are two 
limits going on. The first is the limit of the terms a,. The second is the limit of the 
partial sums, which, if it exists, is the sum of the series. Students are always tempted 
to use phrases like “it converges.” The problem is that here there are two different 
“it’s! Our last theorem says “if the terms do not converge to zero, then the series 
does not converge to anything.” Never use “it” unless it’s clear what “it” is. In the 
following, I will try to use “converge” and “diverge” only in reference to series, and 
say “the limit exists” or “does not exist” when I am talking about some other limit. 


3.4 Convergence: when we can compute the partial 
sums 


There are very few cases where we can just compute the partial sums exactly, 
in which case it is often easy to decide whether a series converges. One of them we 
have already met: the geometric series 


Lx txr per tx"... 
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converges to 1/(1 — x) as long as |x| < 1. The reason we know that is that we can 
actually compute the partial sums explicitly. We saw that 


1 xntl 


L+xtx7 $e $x" = —— - S—, 
1-x 1-x 
so we can see the convergence happen when x”*! — 0, 

There are only a few other examples where we can compute partial sums explic- 
itly. The most common one is when there is cancellation. For example, suppose we 
have the divergent series 


1-14+1-1+1-1+... 


Then we see at once that if we add an odd number of terms we get | and if we add 
an even number of terms we get 0. 

Of course, we always have to work with finite sums in order to find partial sums. 
Indeed, we should always keep in mind that infinite sums are not really sums at all, 
they are limits, and so weird things can happen. 

Here’s a more interesting example of (nicely hidden) cancellation. Look at the 
series 


ea ae 
2 6 12 n(n + 1) 

All the terms are positive, so there’s no cancellation to consider yet. But suppose 

we notice something. (Leibniz was the one who first did this, I think, so I’m not 


suggesting it’s easy to think of it!) 


er ee 
nn+1) n n+l 


seen 


Let’s use that on a partial sum: 
1 1,41 1 


S,=xtst—atert¢ 
NBs AD n(n + 1) 


=(ieg)*(-s) aa) Gea) 


almost everything cancels, so we get 


a ee ae 
k+1 
Now it’s easy to see that the limit as k — oo is 1. So we have shown that 
1 1,1 1 


This situation, when each term turns out to be equal to a difference and we get can- 
cellation of all the middle terms, is called “telescoping.” So this is an example of 
a telescoping series. They are somewhat artificial, but every so often a useful one 
shows up. Leibniz was interested in this one because the denominators are exactly 
twice the triangular numbers. 
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3.5 Alternating series: a slightly harder ‘‘yes” 


The example we want to consider here is of a kind of series for which we can 
prove convergence even though we don’t know what the limit is. In other words, we 
are going to consider certain series for which we can prove lim .S; exists even though 
we do not know what the sum S should be. 

Let’s look at the series 


1 
jee be be] + 
2 3 4 CD +1 
The general term looks that way because my labeling starts at ag = 1, forcing me 
~1)" 
to say a, = x . We could also start at 1; that would change the formula for the 


n 
general term, but not the overall series. 
The features we will be using are three: 


e The terms alternate in sign. 
e The absolute values of the terms are decreasing. 
e The limit of the terms is zero. 


Any series with these properties will behave the same way, but we’ll use our specific 
series to make things more concrete. 

As we know, deciding on convergence means looking at the partial sums. So 
let’s. 


e Sp = land S)=1-5=5=05. 


e S,= 1-444 = 0.8333 and $;=1-++ = 0.583333. 


It 
3° 4 


e S4 = 0.783333 and S5 = 0.616666. 


OO O_O OO 


SiS 05. Sz Ss Sa Sz So=T 


What do we see? It seems that the even-numbered sums (the ones that end with 
adding something) are getting smaller. It seems that the odd-numbered sums (the 
ones that end with a subtraction) are getting bigger. This is promising, since it seems 
that there must be a limit in the middle there somewhere. Of course, we have to 
convince ourselves that the pattern will continue forever. 


The even-numbered partial sums: We know that 


So=l 

1 1 1 1 
Sy =1-=+=5=S)-=+==0. 
; 5 tz = S0— 5 +5 = 0.833 

1 1 1,1 1 1 
S,=1l-——- ae -=S5,-- ==0.7 * 
; stg gts = Seg t 5 = 0.78333 
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How do we get S¢? Well, we take S4 and add two more terms to it. So 


1 1 
Se =Sg-stz. 
6 476 °F 
Notice that we subtract more than we add, so the answer gets smaller. This always 
happens, because each term has smaller absolute value than the previous one. 


In general, 
1 1 
S>, = 8 -—+. <S$ 
2n 2n—2 On n+1 2n—2 
because 1 /(2n + 1) is smaller than 1/2n. So the even-numbered partial sums form a 
decreasing sequence 


Sp > S, > S4>... 


The odd-numbered sums: This is the same, but we add first, then subtract, so we 
add more than we subtract, since we know that the absolute values of the terms are 
getting smaller. The general formula is 


1 1 
Song1 = Soni + el Dna Son-1- 


In other words, the odd-numbered partial sums form an increasing sequence 


S, < S83 <S5<... 


Do they ever overlap? We know that we have 
So > Sz, > S4>... 


and 
S) < 83 < S85 <... 


But we also have, at the beginning at least, that S, < So. So the picture is that 
we have a large starting point Sp after which the even ones get smaller and a small 
starting point S', after which the odd ones get bigger. So at first evens are bigger than 
odds but they are marching toward each other. Do they ever overlap? For example, 
could it happen that by the time we get to Sjo9 the evens have gotten past one of the 
odds? Could So) be smaller than S53, for example? 


o—_.+_.+  _—_——_. JD, ~,.  . 


Si:=0.5 Ss Ss S100 $53 Sa S2. So=1 


Here’s how to see that it can’t happen. (This is tricky; watch my hands!) When 


we add up to get Sigg, the last thing we do is add something (we add = but that 


doesn’t matter). So to get jg, we will subtract. So we know that Sg, < Sjo9. But 
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we also know that the odd-numbered sums are increasing, so S53 < S'jo,. The two 
inequalities together show that S53 < S99. So it cannot happen that Sj, is smaller. 

Generalizing, we see that any odd-numbered sum is smaller than any of the even- 
numbered sums. 


How far apart are they? Now we have a very interesting picture. The odd-numbered 
sums start small and get bigger, but never bigger than any of the even-numbered 
ones. The even-numbered sums start big and get smaller, but never smaller than any 
of the odd-numbered sums. So only two things can happen: either both sequences 
converge to the same limit in the middle or each converges to a separate limit. We 
want to convince ourselves that the latter is impossible. 


Si = 0.5 S3 Ss $4001 S20000 S4 S2. So=1 


O E 
<---> 


Suppose the even-numbered sums converge to a limit EF and the odd-numbered 
ones converge to a different limit O. Of course we must have O < E. All the even 
sums will be above E, approaching it from above, while all the odd sums will be 
below O, approaching it from below. That means that the distance between an odd 
sum and an even sum is always at least E—O. But that can’t be! The distance between 
S>, and S>,,, is exactly 1/(2n + 2), which goes to zero as n gets bigger. So it is 
impossible to have two different limits O and E. Instead, the two sums must converge 
to the same limit, which means all the sums get close to the same number |S, and the 
series converges. 

This argument has shown that if a series has the three properties we highlighted 
(alternating, terms decreasing in absolute value with limit zero), then it converges. 


And one more thing! We actually get a bit more information that is worth recording. 
For any n the limit .S' must be between 5S, and S,,,;, because one is an odd sum and 
the other is an even sum. Since S',,; = S,+ pa we see that 
n 
iss ien* 
n+2 


The picture shows the situation when n is even. 


Si: = 0.5 S3 Ss Sentt S Son S4 S2 So 


ll 
as 


< 1/(2n+2) 
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Or, to make it more general, if the terms a,, have the properties we are assuming, 
then the series converges to a sum S’,, and we have 


|S — Sh] < langil- 


In English, the difference between the limit and the n-th partial sum is smaller than 
the first unused term. 
In our example, we have shown that the series 
| 
l1--=+-=-... 
2 3 
converges. We don’t know what the sum is. But we do know that if we take a partial 


sum, say 


ees Cae 1 
poe Pe in Poe eget 
Fhsé yt 700 = 0.988172, 


then the difference between that number and the real value of the series is at most 
equal to the first term we did not use, which is 1/101 = 0.0099. That tells us that 
this partial sum should agree with the final sum to at least two decimal places. In 
fact, the true sum is approximately 0.693147, and the error we make by stopping at 
the 100th term is about 0.005. 

Of course, it’s very unsatisfying to just say “the series converges to something” 
if we can’t decide what that something is. Can you guess what it is in our example? 
(Hint: take one of the series you know in your sleep and push the boundaries.) 

OK, here’s the upshot, then: 


Theorem 3.5.1 (Alternating Series). Suppose we have a series 
ag tay + a+... 
such that 
a. The terms alternate in sign. 
b. The absolute values |a,,| of the terms are decreasing. 


c. The limit of the terms is zero, i.e., lim a, = 0. 
now 


Then we can conclude 
a. The series converges to a limit S. 


b. The difference between S and the partial sum ag + a, +++: +4a,, is smaller than 
the absolute value of the first term we did not use, i.e., 


|S —(ay +a, ++ +a,)| < l@nail- 
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This is particularly nice because many of the series we know in our sleep are 
alternating! The series for sin(x), cos(x), and In(1 + x) all alternate in sign (the last 
one only if x > 0) and, since they converge, the terms tend to zero. Because we did 
all that work using the Taylor error bound, we know what they converge to. As long 
as the terms are decreasing, we can use this theorem to conclude that the error is 
bounded by the first unused term. That is much easier than trying to find the darned 
M 


n 


3.5.1 Problems 


Problem 3.5.1: There is a big difference between “converges” and “converges 
quickly”! The series in this section provides an example. We saw that after adding 
100 terms all we could say was that the partial sum 


Tbe 1 
1--~+2-——+4+>---— —~ = 0.688172 
a ae Ti 
was no more than 0.005 away from the true answer. All we got from adding 100 
terms were two decimal places. 

To see that it is not just a matter of our error estimate being bad, we can use the 


fact that 


fit, 
1-=+=-—-+--=In(2). 
ake tae tak n(2) 


Suppose we want the answer with error less than 10-°. How many terms do we need 
to add? 


Problem 3.5.2: One can prove that 


ye ail 
2 4 9 16 25 36 12° 


How many terms does one have to add in order to be sure to get 27/12 with error 
less than 10-6? 


3.6 The harmonic series: a not-so-easy “no” 


The easy way to tell that a series will not converge is to see that its terms do 
not go to zero. But, as we emphasized, that only works to show something does not 
converge. In this section, we will look at an example of a series whose terms do go 
to zero, but which diverges anyway. This famous example is known as the harmonic 
series. 

Let’s take the very simple series 

1 1.1 


1 
L+ iti tite tit... 
Pea pe 


Going All the Way: Convergence 73 


This is just like the example we studied in the previous section, but now all the terms 
are positive, so we don’t get any of the cancellation we saw before. In fact, now the 
partial sums do not oscillate. 

Let’s compute a few partial sums: 


S, =1, S> = 15, S3 = 1.833, Sy = 2.0833, S's = 2.2833... 


They keep getting bigger. That is to be expected, since each time we are adding a 
positive number 1 /n. That number gets very small as we keep going, but it is always 
positive, so each sum is a little bigger than the previous one. In other words, the 
partial sums are always increasing: 


S) < Sy < $3 < Sy << S, < S,4) <... 


We can make a picture like the ones in the previous section: 


ee]. $$ 0 9 900 
Si S2 S3 Sa 
The partial sums increase all the time; the gap between two of them is equal to 
the last term we added, which in this case is just 1/n. So the gaps get smaller as we 
go along. 
What can happen? It might help to look at an example from single-variable cal- 


culus first. Let’s graph two increasing functions. In blue, we have f(x) = In(x), while 
in green we have g(x) = 1 — 1/x. 


1.54 


1.04 


0.54 


0.0 4 
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Both functions increase as x grows, and both increase less and less, which is like 
our partial sums that always increase but with gaps that get smaller. But the logarithm 
increases forever, while 1 — 1/x is asymptotic to y = 1: 


lim In(x) = +o, lim (1-+)=1 
xX>+00 x>+0 x 
Those are the only two things that can happen for an increasing function. 

The same thing is true for increasing partial sums: either they approach a limit 
from below, like the asymptotic function, or they keep growing and go to infinity. In 
the latter case, there is no limit and the series does not converge. 

To show that for the harmonic series the partial sums do keep getting bigger. 
We'll use an idea that goes back to Nicole Oresme, writing around 1350. He grouped 
the series into chunks ending at a power of 2: 


1 3 
mas) =3 
ial pale th ait 
3 4 4 4 2 
oe eee Peele ee es ee 
5 6 7 8 8 8 8 8 2 
LE Leis oe Sener ese sgl me 
9 10 Ii 15 16 16 2 
eee ae Oe eo > 16 — = 
17 18 19 20 31 32 32 2 


S,=1, 
ee 
Sa>145, 
Sp> 145, 
Sis> 145, 
etc. 


and in general, for every n > 2, 


n 
Sor > lt. 
- 2 
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This shows that the partial sums get arbitrarily big, since 5 — oo. In other words, 


Theorem 3.6.1. The harmonic series diverges. 


On the other hand, the estimate Syn. > 1+ suggests that it takes quite a while, 
since it says that doubling the number of terms adds at least 1/2 to the partial sum. 
That means that the estimate is growing like the logarithm of the number of terms. 

For example, if we want a partial sum that is bigger than 10 = 1+ = , our estimate 


says we need to add up to n = 2!8 = 262, 144. 

Notice, however, that all we know is that adding 2!* terms is enough. Maybe our 
estimate for n is too big, and we can get a partial sum greater than 10 much sooner 
than that. Let’s use SageMath to check. 


sage: var(?n’) 

n 

sage: sum(1.0/n,n,1,2718) 
13.053866822328144 


So we overshot; this partial sum is more than 10, but we could have taken fewer 
terms. What this means is each partial sum is actually bigger than our estimate sug- 
gests, but it’s hard to tell by how much. That makes them diverge even faster, of 
course, but it would be nice to have a more precise estimate. The exercises below 
pick up on this. 

Meanwhile, the thing to remember is that the harmonic series 


(ere e aceg ly 
2. Soa no 
does not converge even though “ — 0. Looking at the general term is not enough to 
decide on convergence. 


3.6.1 Problems 


Problem 3.6.1: Our attempt, with n = 2!8 = 262,144 terms gave much too big an 
answer. Play around with SageMath to find out how many terms are actually needed 
to make S,, > 10. Is the answer surprising? 


Problem 3.6.2: To get a more precise error term, let’s look more carefully at the 
harmonic series 


> big os Te a a oud 
ram 4 
We'll use a graphical approach to get an approximation to the partial sums. 
First, notice that 1/n is what you get when you plug in x = n into the function 
f(x) = 1/x, whose graph is drawn on the left below: 
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i 
| 
2 


Now, for each n, draw a box of height 1 /n and width 1, with its bottom left corner 
at the point x = n, as in the second graph. 


a. What is the area of the n-th box? What is the sum of the areas of the first k 
boxes? 


b. Use your answer to write down an integral that is smaller than the k-th partial 
sum of the harmonic series. That is, it gives a lower bound for the k-th partial 
sum. 


c. Compute your integral, and check that its limit, as k goes to infinity, is infinity. 
Since the partial sums of the series are larger, this means they go to infinity 
too. 


d. Can you use a similar picture to find an upper bound for the k-th partial sum? 


e. Explain why In(k) is a pretty good estimate for the k-th partial sum. 


3.7 Series with positive terms, part 1 


There is one observation we made when analyzing the harmonic series that gen- 
eralizes and is useful. It applies whenever we have a series where all of the terms are 
positive and leads to another way to decide whether such a series converges or not, 
known as the comparison test. 

If we have a series 

a, +a, + a,+°+a,+... 


with all of the terms positive, that is, a, > 0 for all n, then the partial sums will be 
an increasing sequence: 


S < Sy < 83 .< Sy <0 <8, < Syyy <... 


That’s easy to see, since to get S,,, we adda, ,, > 0 to S,: adding positive terms 
makes things bigger. So the partial sums look sort of like this: 
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$9 $28 29 


Si S2 $3 Sa 


Now, as we pointed out when studying the harmonic series, an increasing se- 
quence can only do two things as n > oo. Either it keeps growing arbitrarily large, so 
S,, > +00, or it converges to a limit. That’s because! if we have a bound S,, < B then 
it acts as a barrier: the partial sums can’t grow bigger than B and can’t backtrack. 
They will have to approximate some number (it could be B itself or some smaller 
number). Since they are increasing, there is no chance of any other behavior. 


Si S2 $3 Sa S 


So that tells us: 
Theorem 3.7.1. Jn a series with all of the terms positive 
a, +a, +a,+-+a,+... 
either the partial sums go to infinity or the series converges. 


For the harmonic series, we used the bad side of this: we showed that the partial 
sums were not bounded and concluded that the harmonic series did not converge. 
Here we highlight the good side: 


Theorem 3.7.2. If a series with all of the terms positive 
a, +a,+a,+°-++a,4+... 
has bounded partial sums, then it converges. 


The point is that we only need to know the partial sums are bounded, which is 
usually easier than trying to get at what they are actually equal to. 

One way this gets used is as follows: suppose we have two series with all terms 
positive and that we happen to know that a, < b, for all and that the series b, + 
by + ... converges. Then we know: 


'The claim that bounded increasing sequences must converge, which we also used when studying the 
alternating series, is just one way of saying that the real line does not have any holes. (This property is 
known as “completeness.”’) One can think of it as one of our base assumptions about the real numbers. If 
you are curious about this, plan to take a course in Real Analysis. 
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e The partial sums T,, = b, + by + --- + 5, are bounded. 
e We have S,, = a; +a, +-+a, <b) +b)+-+-+56, =T,, for every n. 


So the .S,, must be bounded too: the bound on the T,, serves as a bound for the |S’, as 
well. From the theorem, we can then conclude that the smaller series a; + a, +... 
also converges. (This is the comparison test.) We can also conclude that 


lim S,, < lim T,, 
noo noo 


so if we know the sum of the bigger series we get a little bit of information about the 
sum of the smaller series. 


Theorem 3.7.3 (Comparison Test). Suppose we have two series with positive terms, 
a,+a,+a,+°-++a,4+... 


and 
bj +bo +b, 4++++b, 4+... 


for which we can show that 
a. The series b, + by + b3 + +++ +b, +... converges. 
b. For every n, a, < by. 
Then the series a, + dy + 43+ ++ +a, +... converges 
Here’s an example of how we can use this. Take the series from section 3.4 


1 1 
+—+ 


1 
"e 2° “aoe 


1 
2 
We showed that it converges to 1. Now compare it to 


1 


1 
tat +e++—— +... 
16 (n+ 1)? 


Li 
4 9 


Since for all n > 1 
1 1 


OX 

(nt+1)2 n(n+1) 
(bigger denominator = smaller fraction) and we know the first series converges, we 
can conclude that the second series also converges. We can also conclude that the 
sum is less than 1, which is the sum of the bigger series. 

Once we know the series starting with 1/4 converges, we can add 1 in front 
without changing the fact that it converges, since that just adds | to all the partial 
sums. So we conclude that the series of reciprocals of squares converges: 

1 1,1 1 


Leg gt 8 a something . 
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But all we know is that the sum is less than 2 (adding one to what we knew before). 
(This is where Leibniz and the Bernoulli brothers were stuck until 1736 or so, when 
Euler discovered that the sum is actually z7/6. That surprised everyone: why would 
a show up?) 

The comparison only needs to hold for large enough n, since the behavior of the 
partial sums when n > oo only depends on the tail, when n is large. The little trick of 
lopping off a term then putting it back after we know the rest converges also tells us 
something important, namely, that we can even handle a shift like a, < b,,,. Since 
scaling also doesn’t change convergence, we can use that as well: if a, < 126,45 for 
all n > 10° and the b-series converges, then so does the a-series. If you push this you 
will end up with a convergence condition called “limit comparison,” which I will let 
you look up if you are curious. 

The big problem with this is that in general we only have the series in which we 
are interested (the a-series) and it’s not always easy to find a b-series that we can use. 
That requires a creative idea. In the next section, we will come up with a test (called 
the ratio test) that avoids that problem. 

Meanwhile, get some practice. 


3.7.1 Problems 
Problem 3.7.1: Use the comparison test to show that 


ee ee 
8 27 Bo 


converges. 


Problem 3.7.2: Generalize: show that > = converges when s > 2. 
n 


n=1 


Problem 3.7.3: We can also use the comparison test to show a series does not con- 


verge. Show that > Z diverges when s < 1. 
n 


n=1 


1 
2n+1 


Problem 3.7.4: Show that py diverges. 
n=0 


Challenge: In this set of exercises, we have shown that the series 


ee oe ee 
at bare car mee 


diverges if s < 1 and converges if s > 2. Can you settle the remaining cases, when 
1 < s < 2? (One method is to follow the ideas in Problem 3.6.2.) 
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3.8 Series with positive terms: part 2 


The goal of this section is to introduce one of the more useful tests used to deter- 
mine whether a series converges. One of its best features is being quite simple, but 
it is particularly useful for us because it is well suited to the kind of series we are 
really most interested in, the power series which involve powers of x or of x — a. 

The test involves a comparison, but one that is hidden from view. Here’s the basic 
idea. Look at a geometric series 


atartar +ar+--+ar"+... 


(this is the series you know in your sleep except that we have made x = r (because 
we want to save x for something else) and multiplied through by a). It’s easy to see 
that the ratio between each term and the previous one is exactly r: 


ar oar ar? art+l 


| SS ees — 


a ar ar2 ar” 


We can’t expect that in a general case 
ag tay tant +a,+..., 


but suppose it is almost true, at least for large n. To write down something (a little) 
more precise, suppose that there is some large number k for which we have 
A+1  Fk+2  Fk+3 


ak Gn41 A420 


~ 
~ oe. 


Then it seems like we could say that the series looks, in the long run, a lot like a 
geometric series, and so should converge as long as r < 1. 

I'll give a precise proof below, but let me state the conclusion first. If that hand- 
waving suffices for you, feel free to ignore the proof, which depends on a subtle (but 
interesting) trick. 


Theorem 3.8.1 (The Ratio Test). Suppose 
ag + ay + da) + ee Gear ost 
is a series with positive terms, so a, > 0 for all n. Suppose also that we have 


a 
: n+l 

lim = 1: 
no Ga, 


Then 
e If L > | the series diverges. 
e If L <1 the series converges. 


e If L =1 or the limit does not exist, we cannot conclude either way. 
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This is very useful, because it does not involve finding another series for a com- 
parison, just computing the limit. Of course, it is not infallible: the limit may fail to 
exist or be equal to 1. 


Proof of the ratio test: 


Qn+1 


Suppose first that L > 1. Then for large enough zn the ratio will be close to 


ay, 
L, hence will be greater than 1. That says a,,,; > a, for all large enough n. But then 
(since all terms are positive) we cannot have a, — 0 and the series cannot converge. 
Suppose L < 1. (Here’s the subtle trick part.) Choose a number r that is slightly 


larger than L but still less than 1, so that L < r < 1. Since the ratio Ant converges to 
n 


L, it will eventually have to be smaller than r. Let N be large enough so that “nth <r 


n 


for alln > N. Then we have ay, < ray. But we also have <Nee <r, so, plugging 
N+1 


in, y49 <ray4 <1r’ay. Going on like that we see 
Dip ee ay forall k>N. 
That means that the series 
an + aAN+1 + an +42 +H aay 
has terms that are smaller than those of the series 
2 
an tray t+ray t+... 


This last one is geometric, and since r < 1 it converges. By the comparison test, the 
former series converges. Adding the initial segment back in, we conclude that the 
original series 

ag ta; +a, t+--++a,+... 


converges. 
So we have proved the first two claims. To show things are inconclusive when 
L =1,T I let you check that for both the divergent harmonic series 


eae rer me 
2 3 no 
and the convergent 
ieee ees tee 
ao ae 


we will have L = 1. So if L = 1 we really don’t know what will happen. 

Finally, if there is no limit we have nothing to work with, so there’s no conclusion 
in this case either. 

So, end of proof. Notice that the main tool here was a comparison, so all of this 
works only for series whose terms are positive. But wait, absolute values will rescue 
us (sort of). 
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3.8.1 Problems 


Problem 3.8.1: Let’s try this on some series. In each case apply the ratio test and 
decide what it tells us. (The “what it tells us” will always be either “the series con- 
verges,” “the series converges,” or “the ratio test is inconclusive.”) I’ve given exam- 
ples both in summation notation and written out. 


37 33 3” 

aT eer ec i rT a 
pus 

amt, (2n)! 

Sn +1 
oye 

n=1 

ei 23 n 
ln Me eae ae te 

tata el 


3.9 One way to handle series whose terms are not all 
positive 


As usual, I’m going to consider an example, but everything I will do applies in 
general. I invite you to check that as we go along. 
Let’s consider this series: 


Tales Mie. allt PY 2 od 1 
1----4+—-—-— +e tt... 
n2 


4 9 16 25 36 49 
where the pattern of signs is + — — throughout. We can’t use any of our results for 
series whose terms are positive, because these aren’t. And we can’t use our alternat- 
ing series theorem because the signs don’t alternate. So what can we do? 
Well, notice first that if we replaced each term by its absolute value, making all 
the signs plus, then we would get 


1 
“2? 


Mes 


n 


ll 
= 


n 


which is a convergent series, as we showed above. Adding signs creates cancellation, 
and that should help with convergence. Can we prove that? 

Let’s consider four different series. We'll call the original one, with + — — signs, 
O (for “original’”’). We’ Il call the one with all plus signs A (for “absolute value”). Next 
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we’ll create a series P (for “positive’’) like this: when the term from O is positive, 
keep it; when it is negative, write down 0. Finally, we create N the obvious way, with 
a twist: when the term in O is negative, write down its absolute value, and write down 
0 otherwise. Here’s what it looks like: 


i <a 1 1 1 1 
Po se =e eee eee ee a 
. A 9 Pye. 95 36 ao 
1 1 
P 1 — 
HO; HO. eee EO EO 
N Gee ae: as, “oh ee A 
25 36 

1 1 1 1 1 

A 1 +5 +45 — 
4 9 ‘te ‘25 *36 ‘49 


Now notice that the last three series all have positive terms, so we can use the 
comparison test on them. First compare P and A: every term in P is either equal 
to the corresponding term in A or it is zero, which is less than the corresponding 
term in A. Since A converges, we can conclude that P converges as well. Similarly, 
every term in N is either equal to or less than the corresponding term in A, so we 
can conclude that N converges as well. 

Suppose we write N, for the k-th partial sum of N and P, for the k-th partial sum 
of P. Since both series converge, these partial sums converge to something. Let’s say 


lim N,=WN and lim P, = P. 
k->00 k->0o 


The k-th partial sum of the original series is O, = P, — N, (for finite sums we can 
rearrange, so we can add all the + terms and then subtract all the — terms). So the 
limit of the partial sums of the original series is 

k= 00 k>00 k>0 k>0 
Since the last two limits exist, so does the limit of the S,, and the original series O 


will converge. 
The upshot is this: 


foo} 


Theorem 3.9.1 (Absolute Convergence). If the series >, |a,,| converges, then so 


n=0 
wo 


foo} 
does the series > a,. When this happens we will say that the series Dy Ja, | con- 
n=0 n=0 
verges absolutely. 


We already know an example that shows that not all series that converge do it 
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absolutely. The alternating series 


converges, but if we take absolute values we get the harmonic series 


ise eos 
23 4 °° 
which we know does not converge. But our result is still useful, because it will allow 
us to use tests that apply only to series with positive terms (especially the ratio test) 
to more general series. 


Two notes: 


1) Series that converge but whose absolute value series do not converge are said 
to converge conditionally, since their convergence depends on the way the signs are 
arranged.” In your real analysis course (you're going to take one, right?) you will 
learn that absolute convergence is much friendlier than conditional convergence. For 
example, if a series is absolutely convergent then reordering the terms will not change 
convergence nor the sum. On the other hand, a conditionally convergent series can 
be reordered to give a different sum, or to fail to converge at all. So the commutative 
law is false for “infinite addition” unless the convergence is absolute. 


2) The proof above is based on a fact about the real numbers: the only solutions 
to |x| = bare x = b and x = —b. That is what allowed us to split our series into 
P and N. This is not true for complex numbers, but the theorem is still true. That 
tells me that this is not the best proof of the theorem. The best proof uses the Cauchy 
Criterion, however, and that is a bit harder to understand, so the easier proof will do 
for us. 


There is much more I could say about infinite series in general, but that would 
divert us from our real goal, which is to understand power series. If you want to see 
more, the natural place to start would be a real analysis textbook. You might also 
look at [6] and/or [15]. 


?That’s the standard explanation, but I’m willing to bet that it’s also a punning reference to the abso- 
lute/conditional distinction in philosophy. 


1 


Check for 
updates 


4 Power Series 


Now that we know the basics of convergence, let’s go back to the thing we are really 
interested in, namely, series in increasing powers of x or of x — a. We know that 
for any (infinitely differentiable) function f(x) we can find a Taylor series for f(x), 
which will look like 


Co + ¢) (xX — a) + 05(x —ay+e +c,(x-a)" +... 


for some list of numbers co, c,, C2, ... . We want to know whether this series converges 
and if so (a separate question!) whether the sum is actually f(x). If this happens, then 
we can think of the series as another way to write down the function f(x), one that 
is sometimes easier to work with. 

We also want to reverse the process, however. Suppose we start from some list 
of numbers co, c;, C, .... Then we can make a series from them: 


Cy + ¢,(x — a) + ey(x — a)? + ss +¢,(x—a)+.... 


If for some value x this series converges, we get a number that depends on x. If we 
call that number f(x) we have just defined a new function, one whose Taylor series 
is the one we wrote down. 

This is again like decimal expansions. If we start with a number like 1/3, we 
get a decimal expansion 0.3333.... But we can do it backward too: if we write 
0.12345 ..., that is some real number. (The implied series always converges; can 
you check that?) And I really know nothing about the number 0.12345... except 
that expansion. 

The first thing we need to do is decide when a power series converges (to some- 
thing). Then we will consider how to manipulate known series to find new series. 
Finally, we will look at some of the things we can do with power series: create new 
functions, solve differential equations, and study numerical sequences via generating 
functions. 

There is a name for series of the kind we want to study: power series. To be 
careful, here’s a definition. 


A power series centered at a is a series whose terms look like c,,(x—a)", 
where each c,, is a number. 


All of the Taylor series we have written down are power series. But of course 
there are series that do not look like this. For example, this is not a power series: 


14+ 2(x — 1) + 3(x —2% +4(x -— 3) + + (n+ D(x — 1)" 4+... 
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There is a clear pattern involving powers, but there isn’t a center as there must be for 
power series. 

There are two big reasons to like power series. First, they are the ones we get out 
of the Taylor polynomials. In other words, they arise in a fairly “natural” way from 
the theory of derivatives. Second, and much more important, power series behave 
really well. That is the main message of this chapter. 


4.1 Convergence of power series 


Let’s look at when a power series converges. If it is a series centered at some 
a # 0, we can always let u = x — a to get a power series in powers of u centered at 
0. Whatever we prove about that series we can translate back replacing u by x — a. 
So we will first do our analysis for series centered at 0, then translate to the general 
case. 

Suppose we have a power series centered at a = 0: 


CoH CX + eyx? to HEX +... 


The first thing to note is that when x = 0 this is just equal to cg, so there is always 
at least one value of x for which the series converges. This is good to know but not 
very interesting.! 

What about other values of x? Let’s work out what happens in a special case, 
when none of the coefficients c, are zero. One way to check for convergence of a 
series (almost the only one we know) is to take absolute values (to get a series with 
positive terms) and then apply the ratio test. For that, we need to compute 


| jae 


lia lCn41X lensil |x lCn1l 


nv00 |e, x"| noo |e, | |x|" 00 [ea | 


|x| = L |x| 


Cn+1l 


as long as the limit lim | 
noo lc | 


If L = 0, then L |x| = Ois less than 1 no matter what x is, and so no matter what 
x is we can conclude that the series converges. So in this case the series converges 
for every x. If L = oo, then L |x| = co unless x = 0, so the series converges only 
when x = 0. 

For the other cases, the answer will depend on x: 


exists and is equal to L. 


e The series converges when L |x| < 1, that is, when |x| < > 


e The series diverges when L |x| > 1, that is, when |x| > +. 


We usually write R = 1/L and call it the radius of convergence. Since saying |x| < 
R means —R < x < R, this translates nicely to a picture: 


' Still, it’s nice; the non-power series above does not converge for any value of x. 
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-R 0 R 
Here a = 0 is at the center, and we know the series converges inside the range 
from —R to R and diverges outside that range. When x = +R the limit in the ratio 
test is 1, and that tells us nothing, so we don’t know whether the series converges 
or not at those points. If we want to know exactly for which x there is convergence, 
those two values need to be considered separately. 
The cases of L = 0 and L = +o are usually included, like this: 


e If the series always converges, i.e., converges for every x, we say R = oo. 
(Picture +R moving farther and farther from zero.) 


e If the series converges only for x = 0, then we say R = 0. 


That way (as long as the ratio test limit exists or is +00) we always have a radius of 
convergence. 

If instead of a series centered at zero we have one in powers of x — a, the result 
will be similar. The series will always converge when x = a. For other x, we use the 
ratio test; the limit looks like 


sien me OM leneal b= al! 
noo |eq(x — a)"| noo |c,| |x —al” 
ee 
= lim 7! |x -al 
noo Icy | 
= L|x-al 


Again we want to see if this is less than or greater than 1, and we end up with a 
condition that looks like |x — a| < R. This translates to —R < x —a < Ror, if 
you like, a— R < x < a+ R. It’s the same picture again: a is at the center, and 
convergence happens as long as we are less than R units away from a. 


SSS SSS SSS 


a—R a a+R 
We did these computations assuming that the limit existed and that none of the 
coefficients were zero (which isn’t true for the sine series!) and that the limit in the 
ratio test exists. But in fact (though we won’t give a proof) it is the case that all power 
series behave this way. 


Theorem 4.1.1. Consider a power series 
Cy +.e4(x — a) + ey(x - ay po + c, (x — a)" +... 


Then one of the following things is true: 
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a. The series converges only if x = a. In that case we say that the radius of 
convergence of the series is R = 0. 


b. The series converges for every x. In that case we say that the radius of con- 
vergence of the series is R = ow. 


c. There exists anumber R > 0 such that the series converges when |x —a| < R 
and diverges when |x — a| > R. We call R the radius of convergence of the 
series. 


In the third case, the series may or may not converge at the endpoints x = a— Rand 
x=at+R. 


There are two ways to prove this theorem. In the first, we use the comparison test 
to show that if the series converges when |x — a| = b then it also converges when 
|x — a| < b. That implies that R must exist, but doesn’t find it. The second method 
actually finds the radius of convergence R using a stronger result called the “root 
test” and a generalized notion of limit. 

Given a power series, we can often find the radius of convergence R by using the 
ratio test. Of course, this never tells us what the series converges to, but at least it 
may tell us that it converges to something. The ratio test never tells us what happens 
at the endpoints, when |x — a| = R, which have to be investigated directly. 


4.2 Examples of finding the radius of convergence 


We now know that power series always have a radius of convergence. The argu- 
ment in the previous section suggested that we can find the radius of convergence by 
using the fact that R = 1/L where 


That’s actually not the best way to do it, because many power series have c,, = 0 for 
some values of n. Then the limit above won’t exist, because many of the fractions 
have zero in the denominator. But when there are coefficients that are zero, there 
is usually a pattern (as in the sine function, whose even-numbered coefficients are 
zero), and it turns out we can still do the computation by working directly with the 
ratio test. So we’ll do that in all our examples too. 

First let’s look at some of the series we know in our sleep. 


1. Exponential. We know that for all x we have 
- 3 n 


x x x 
pope tot... 
n! 


es 
ae T 31 


So the radius of convergence of this series must be infinity. Let’s use the ratio test to 
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see that it is. We need to compute this monster: 


|x|"*! 
F (n+ 1)! 
lim 
n> co |x|” 
n! 


In the denominator we just put the absolute value of general term of the series, while 

in the numerator we replace n by n+ 1 everywhere. When factors are already positive, 

we don’t need to worry about absolute values, of course. Then we rearrange and 

cancel as must as we can. (Factorials and powers of |x| are great for canceling!) 
So: 


|x|"! 
i 2) aa |x|"*! n! 
noo |x|" neo |x|"(n+ 1)! 
n! 
ox TROY. al 


a 


n>oo |x|" (n+1)! 


= lim |x| J = 0, 
no n+ 1 
since x is fixed and n is going to infinity. 

The limit is 0 no matter what x is, so it is less than 1 no matter what x is, which 
means that the series always converges no matter what x is, i.e., the radius of con- 
vergence is infinity. 

Notice that the ratio test does not tell us what the sum is, just that there is one. 
We know it is equal to e* because we estimated the difference between the partial 
sums and e* using our formula for the error in the approximation given by the Tay- 
lor polynomials. Finding the sum always takes a lot more work than just checking 
convergence. 


2. Sine and Cosine. Now let’s do the series for sin(x), which is very similar except 
for the fact that only odd powers of x appear. We know 


3 5 x2ntl 


: x x 
=x- ta t¢e-4+(-1)"——_ + ... 
SIN) a Ps Gat D! 
Since this is true for every x, we already know R = oo but let’s check it anyway. The 
main difficulty is getting the algebra right. So here we go. 
Notice that replacing n by n+ 1 changes 2n+ | into 2n +3, the next odd number. 
So the ratio test limit looks like 


|x |2"+3 
: (2n + 3)! _ |x|?" (2n +1)! 
lim ————— = lim ———————__.. 
n->0oo [x [201 n->oo |x|27t1 (2n + 3)! 


(2n + 1)! 
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Now notice that in the numerator there are two extra factors of |x| , since |x|?"+3 = 
|x|? |x|?"+!; similarly, 


(2n + 3)! = (2n+ 3)(2n +2) (2n+ 1)! 


So we have 
|x|2"+3 
, Ca ae |x|2"43 (2n + 1)! 
nesce: © (gfe noo |x|2"+1(2n + 3)! 
(2n + 1)! 


|x|? 
= lim ——————. = 
n>oo (2n + 3)(2n + 2) 


Since the answer is less than 1 no matter what x is, we can conclude R = oo again. 


3. One for which we don’t know the answer yet. Let’s consider 
co 
bas = x7 + 2x4 43x94 oe tnx 4+... 
n=1 


For the ratio test, we need to consider the limit 


; (n+ 1)|x|2”"*2 
lim —————_. 


Nw n|x|2" 


Notice again how we build that: the denominator is the general term; the numerator 
just takes the absolute value of the general term and replaces n by n+ 1. Since n and 
n+ | are positive, they don’t need absolute values. As always, the key is to notice 
that most of the powers of |x| cancel out. We get 


— (n+ 1)[x|2"2 
lim OS lim —— 


n> oo n|x|2" n>0o 


So the ratio test tells us that the series converges when |x|? < 1 and diverges when 
|x|? > 1. Taking square roots (everything is positive!), we see that the radius of 
convergence is R= 1. 

For the previous examples we didn’t need to worry about endpoints, because the 
series converged for every value of x. In this case, we know the series converges 
when |x| < 1 and diverges when |x| > 1, but we need to test what happens when 
|x| = 1. For that, we just plug in x = +1. Since we have even powers throughout, in 
both cases we get 

142434+--4n+... 


which just gets bigger and bigger and so does not converge. So the series converges 
for |x| < 1 and diverges for |x| > 1. 
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4. One final example. Look at 


— x” x2 x3 x4 
—=x+ +24 —+4+ 
2 n 2 3 4 
n=1 
The ratio test is easy: 
[x|"t! 
fe fi ie Se: 
n> 0o |x| n>on+] 
n 


so again we have convergence for |x| < 1 and divergence for |x| > 1, and the radius 
R=1. 
What is interesting are the endpoints. When x = | we get 


idee 
2 3 
That is the harmonic series, which we know diverges. But if we make x = —1 we 
get 
sail Mg, 
2 3 4 


which is a convergent alternating series. So R = 1, but we have some extra informa- 
tion: the series converges when —1 < x < 1 and diverges when x < —lorx > 1. 
This emphasizes that knowing R doesn’t tell us what will happen at the endpoints of 
the interval of convergence. 


4.2.1 Problems 


Problem 4.2.1: What is the radius of convergence for the rest of the series you know 
in your sleep? 


Problem 4.2.2: Use the ratio test to decide for which values of x the series below 
converge. 


a. y nx", d. y n(x — 3)". 
n=0 


n=0 
wo non foe) 
be 8 - e. ("x 
n=0 Je n=0 
a5 au 1)"2"x" ft y n!x"” 
nm+7 93n+2° 


n=0 n=0 
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Problem 4.2.3: Find the radius of convergence for the power series 


wo 
n2x2n 


pyar 


n=0 


Can you decide what happens at the endpoints of the region of convergence? 


Problem 4.2.4: Consider a power series ys c,(x — 3)". Suppose you know that the 
n=0 
series converges when x = —1 and that it diverges when x = 9. For each of the state- 


ments below, decide whether the truth or falsity of the statement can be determined 
from this information, and if so decide which it is. 


a. The series converges when x = 3. 
b. The series diverges when x = 8. 

c. The series converges when x = —9. 
d. The series converges for all x. 

e. The series converges when x = 1. 


Problem 4.2.5: Suppose you are writing an exam and want to include a problem 
where students use the ratio test to find a radius of convergence. You want it to be 
fairly interesting, so you don’t want R to be one of the standard trio 0, 1, co. You 
also do not want the center to be 0. Come up with a good example and write out the 
solution. 


4.3. Power series freedom 


Playing around with series as if they were finite sums can lead to mistakes. For 
example, if you change the order of the terms in a conditionally convergent series, 
you can change the sum. “Infinite addition,” as opposed to ordinary finite addition, 
is not commutative. 

But power series are special. This section summarizes several operations that can 
be done with power series without messing up convergence nor giving unexpected 
results. Some of these are fairly easy to prove, while others are much harder. (In the 
list below, the proofs get harder and harder as we go on.) 

We will just state the results without trying to give proofs, but let’s say a word 
(OK, two words) about why it all works. Two properties of power series are critical. 
The first you already know about: since we find the radius of convergence using 
the ratio test, we will always have absolute convergence, because that is what the 
ratio test tests for. (To be precise, the convergence is absolute except maybe at the 
endpoints.) The second property you haven’t heard about yet (but see Section 5.2 
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if you are curious): on a closed interval inside the region of convergence, we will 
always have uniform convergence. 

Taken together, these two properties guarantee that power series behave very 
nicely under all sorts of algebraic and calculus procedures. And that is wonderful: it 
allows us to find power series for complicated functions by using the series we know 
in our sleep. 


Plugging in: We can always replace x by something else; the resulting series will 
converge when the thing we plugged in satisfies the convergence condition. Start, for 
example, with the geometric series 

1 


1-x 


=1L¢xtx? gx peta ™t.., 


valid when |x| < 1. We can replace x by u — | to get 


— 1G) 4G 12 H@H 1) oko ss 
which will converge when |u—1| < 1. Notice that the result is a power series centered 
at 1. We changed the variable to u, but variable names are arbitrary, so we could have 
used x. 

A similar computation can be used to shift the center of a series. For example, 
since 1 -—x =3—x-2=3-(x4+2)=301- a(x + 2)), we can use the geometric 
series to compute 
1 bw (et2)" (x +2)" 

ew >» See » See 


1 
l-x 342 ae ioe Mak Seer aie 


? 


which is the Taylor series for 1/(1 — x) centered at a = —2. 


Adding and Subtracting: We can add or subtract two power series term by term. 
The region of convergence of the resulting series is the region where both series 
converge. So, for example, if we start with the series for e* and plug in —x for x, we 
get a series for e-*. Adding the two, we have 


n cay n On n 

x —x x (—x) lee. x" + (—x) 

Ee Digs a oy 
n=0 n=0 n=0 


Since the series for e* is valid for all x, so is this one. Now we can simplify. When n 
is odd, (—x)” = —x” and that term cancels out. When n is even we have (—x)”" = x". 
So only the even terms survive; we can write the final series like this: 


foo} 


Ix2k 
e+te*= ; 
py (2k)! 


If the summation notation confuses you, write out the series to see. The key thing 
is that we are adding term by term, pairing together those terms that have the same 
degree. 
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Multiplying and Dividing: We can also multiply, and again the convergence will 
happen wherever both series converge. But remember that when you multiply two 
sums you need to distribute one sum over the other (i.e., “foil it’), but there are 
infinitely many things to add. Dividing also works as long as the series we are divid- 
ing by is nonzero, but is even harder to work out the general term. 

But you don’t always want to multiply or divide by a whole series. Multiplying 
by x is particularly easy and doesn’t affect the convergence because the series x isn’t 
really a series. Dividing by x is also safe as long as the series we start with has no 
term of degree zero. So, for example, 


x2ntl 


sin(x) _ 1 — — x x x 
= =l-—+—- =}... 
x x Que iy 4, (2n + 1)! =D Geel (2n + 1)! 3! " 5! 7! . 


This is true for any x # 0, and it shows that we should define the function to be 1 
when x equals zero. 


Derivatives and Integrals: We can differentiate and integrate series term by term. 
This will not change the radius of convergence, though the behavior at the endpoints 
may change.” I hinted at this above when I used the series for 1 /(1 +.) to get a series 
for In(1 + x). 

Here’s a more interesting, and famous, example. The derivative of f(x) = 


arctan(x) is f’(x) = —. We can find a series for the derivative by starting 
+x 
from a geometric series. Here we go: 
1 2 ‘i . 
i =ltutulete-tu't... if |x| <1. 
—u 


Replacing u by —x?, 


1 


Sd a PE ee SON Faces 
14x? 


valid if | — x?| < 1, which just says |x| < 1 again. Now take the antiderivative: 


2n+1 


arctan(x) = C+x-L4h4. -+(- rs etal 


5 n+1 
still valid when |x| < 1. Notice that computing antiderivatives always gives an arbi- 
trary constant, but we can find the right constant by setting x = O on both sides: 
since arctan(0) = 0, we must have C = 0. So finally 


oo. a8 xen 
t =x- +t t(-l 
arctan(x) = x 3 5 (-1) acl 


core if |x| <1. 


This is one bit of evidence for a general rule: the arc tangent function behaves much 
better than the tangent function. 


?This is the only case when the behavior at the endpoints can change. 
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In this example, we see a change in the behavior at the endpoints. The series for 
1/d+ x2) does not converge when x = +1, but the series for arctan(x) does. 


Composition: If we have the series for two functions f(x) and g(x) and g(0) = 0, 
then we can find a series for f(g(x)) by plugging in one series into another. This is 
a messy process, but the resulting series will converge to the composed function in 
some reasonable interval. 


Behavior at the endpoints: Suppose we have a series for f(x) that converges when 
|x| < R. As we know, we can’t be sure what happens at the endpoints: the series 
might converge or not. But suppose it does converge: if we plug in x = R, do we get 
F(R)? 

The answer is given by Abel’s theorem. It says that as long as the function is 
continuous at x = R and the series converges when x = R, then the sum will be 
J (R). (We need both things: a continuous function and a convergent series.) 

For example, look at the logarithm series 


2 x3 x4 


x 
Ind =x-—+2—-—+H... 
ni+x)=x 5 + 3 a + 
We know this series converges to In(1 + x) when |x| < 1. But if we plug in x = 1 
on either side everything works: In(1 + 1) = In(2) and the function is continuous at 
that point, while the series becomes alternating: 
1 1 1 


[Sea eet, 
ng. ae 


Since we know the series converges and the function is continuous, Abel’s theorem 
says that we can conclude that 
1 1 1 
InQ2)=1--+=-—-—+... 
2) 2 3 4 
On the other hand, plugging in —1 doesn’t work for the function or the series. 
Another fun example is the series for arctan(x). Since arctan(1) = 2/4 and the 
function is continuous everywhere, we get 


. eM Wc a A 
A ea ee 
4 Be eG. ay 


This series for 2/4 was first discovered by Leibniz. 


What does it converge to?: One of the nice things about all this is that these manip- 
ulations do not just preserve convergence. They also preserve what the series con- 
verges to. Since we have a set of series whose sums we know (in our sleep), we can 
use them to discover what other series converge to. 

Here’s a famous example that we looked at before, the series 


wo 
fx) = byw ya Ox a 98 ae Pe eo 


n=1 
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We saw above that it converges when |x| < 1, but we didn’t know what it converges 
to. I called it f(x) because I don’t know what it is. Can we find out? 

We can if we can figure out how to start with series we know to get this series. 
This takes some trickery, but it can be done. Pll actually work backward first, trans- 
forming this series into one we know. When I finally get a known function, I can 
unwind all my transformations to see what the original series is equal to. 

I'll omit the many failed attempts to find just the right trick, but you can be sure 
they existed. (You should always keep this in mind when reading a mathematics 
book.) 

Notice first that our series involves only even powers of x. I don’t like that, so 
let’s change the variable to u = x? to get rid of it. This gives some function 


g(u) =ut2u?+3u+... 


which converges when |u| < 1. 
Here’s the dastardly trick: divide through by u. That won’t affect the convergence, 
and 


LGR eos 
u 


which looks like a derivative: the general term is nu"~!. If we take the antiderivative 
we get a function 
hu) =C+utw+u+... 


If we take C = | this is the geometric series! As we’ll see soon, we can choose any C 
we like, so let’s choose the good one. So we set C = | and we know that for |u| < 1, 


1 


l-u 


Atu) =1tutwtu3t-= 


All right, we have a function now. 
Now work backward. The derivative of h(u) is 


h'(u) = 


1 
(l—u)? 


(Since the derivative of a constant C is zero, the answer is the same for any value of 
the constant C. So it doesn’t matter that we chose C = 1). 
But remember that h’(u) = +3(u), so as long as |u| < 1 we have 


1 


1 — 
a g(u) = Gwe awe 


Multiply by u to get g(u). 


a Uu 
gu) = dom? 


We’re almost there. 


Power Series 97 


Now remember that we got g(u) by making u = x? in the series for f(x). So 
make u = x2 in the answer too: 


x 
x) = —— . 

fx) ax 

All of this is true when |x| < 1, since none of our moves changed the region of 


convergence. So now we know what our series converges to! 
If |x| < 1, 


2 
x2 42x4 43x54 oe tnx pe = —~ 
(1 — x2)? 


Isn’t that cool? Just try figuring that series out by taking derivatives. .. 


(Did all that manipulation make you nervous? It did me. We can always check, 
however: 


sage: taylor(x72/(1-x72)72,x,0,10) 
5*x710 + 4*x78 + 3*x76 + 2*x74 + x72 


Wooo!) 


To end with a moral: the fact that we can manipulate series means that we hardly 
ever need to use Taylor’s formula to find a power series for a standard function. 
Instead, we start from series we know and manipulate them to find the series we 
want. Maybe I should make this stronger: 


Warning: Never try to find a power series representation by taking 
higher derivatives! Instead, use the series you know in your sleep. This 
is especially true on tests! 


That’s a bit of an overstatement, of course, since we did use derivatives to find the 
series we know in our sleep. But that’s the point: we did that because for those func- 
tions it can be done. For almost all other functions, taking many higher derivatives 
is hopeless. 


Here are a few examples. 


a. g(x) -[{=e dx 


Let’s do this using summation notation. First, as we saw above, 


sins) yn x 
x = di » (2n+ 1)!" 


2n 
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k+1 


: er 1 
Now we integrate term by term. The antiderivative of x* is . and we 


apply this to each term to get 


x2ntl 


sins) | Dvn 
i x ig 1" Gn+ DOnt DI? 


everything valid for all x. 


x 


ea 


The closest series we know in our sleep is the geometric series 


1 
—— =1ltutw4+uye tu"... 
l-u 
We can take care of the x in the numerator by just factoring it out, but we need 
a | — win the denominator. The trick is to also factor out a 2: 


x _—x 1 
Q+x2 214+x2/2° 


Now let u = —x?/2 in the geometric series to get 


1 


aap lt —x2 2)? + (—x2/2)3 4 oe + (—x2/2)" 
1+2/2 1 + (—x? /2) + (—x? /2)* + (=x? /2)° + + (— x7 /2)" +... 


which simplifies to 


1 x2 x4 x6 xn 
—— =1-24+ 5-544)" +... 
1+ x?/2 2 22 23 ) Qe 


This is valid when |u| < 1, so when |x?/2| < 1, which translates to |x?| < 2 
and then to |x| < V2. Finally, multiply by x/2 to get 


x x x3 x? x! 


2n 
— — eee — nx 
24x22 n2 tat oat +(-) Qn+1 


+1 


ins 


valid when |x| < V/2. 


4.3.1 Problems 


Problem 4.3.1: What is the series for the derivative of ? Where does it con- 


verge? 


Problem 4.3.2: Check with series that the derivative of e* is e*. 


Problem 4.3.3: Check with series that the derivative of In(1 + x) is i - : 
x 
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Problem 4.3.4: Find the Taylor series for x*e*. Where does it converge? 


Problem 4.3.5: Find series for these functions. Remember that “find the series” 
means find the general term and also the region of validity. 
a ee 
"14 5x2 
b. e73"” 
1 —cos(x) 


x2 


d. In(1 + x”) 
e. In(2 + x) (factor out a 2...) 
f. (x + 1)sinx 


3 
“2-x 


Problem 4.3.6: In each case, give an expression in closed form for the infinite series, 
which are minor modifications of series you should (still!) know in your sleep. 


a (ee ee 
yp) ee: |e | ee 


b. 1—x? +x®-—x9 +... 


x x3 a x? 


9 3.0 Age Sod 


Problem 4.3.7: Let 


sil 
DP eee 


a. Find the Taylor series for f(x) centered at a = 0. 
b. The graph on the next page shows a graph of f(x) and its Taylor polynomial 


of degree 12 (that is, the part of its Taylor series up to degree 12). Explain why 
the graph looks the way itdoes. 
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3 -2 a 1 2 3 
Problem 4.3.8: On a large scale, the graph of y = x? sin(+) looks like a parabola. 
You can see that clearly in the graph on the left below. Use the Taylor series for 
sin(x) (centered at zero, as usual) to explain why. 
(The first graph shows the region —10 < x < 10, a standard calculator window. 


In the graph on the right, we see the region —1 < x < 1, where it is clear that the 
graph is not really a parabola.) 


100 


-10 -5 5 10 -10 -0.5 0.5 1.0 


Two graphs of x? sin(1/x) 


Problem 4.3.9: The electric potential generated by a uniformly charged disk of total 
charge Q and radius R is given by 


ro) = = (VP +R -y), 


where y is the distance from the disk and F(y) is the potential at distance y. We want 
to understand what happens when y is very large, so do this: 


a. Change variables from y to x = 1/y, so that when y is big, x is close to zero. 
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b. Find a Taylor polynomial centered at x = 0 for the potential in terms of x. 
(Try to find ways to make your life easier. You can get the series from the ones 
you know in your sleep, but you need to fiddle a bit with the function first.) 


c. Use this to justify the following claim: “at points far from the disk, the potential 
is approximately Q/y.” 


Problem 4.3.10: As you may remember, Einstein’s general theory of relativity pre- 
dicts that a light ray passing close to the sun will be bent by a small amount. To test 
that prediction, we need to know exactly how much bending to expect. Doing that 
requires solving an equation similar to 


sind + b(1 + cos 6 + cos’ 0) = 0. 


We want to solve for @ in terms of b, but that is very hard to do. 

In this situation we know that b is a small positive number. If b = 0, then 6 = 0 
is a solution, so it seems likely that there is a solution 0 (depending on b) that is close 
to 0. Since 6 is close to 0 we should be able to approximate sin @ and cos @ using a 
few terms of their Taylor series. Do this and find an approximate solution for 0 in 
terms of b. (The first thing to decide is how many terms to use; base your decision 
on what equations you can solve.) 


Finally, a few challenges for the brave: 


Challenge: Check that e**+’ = e*e” by manipulating the series directly. (You'll need 
to know how to expand (x + y)” to do this.) 


Challenge: Find the first six terms of the series for e* sin(x). 


Challenge: Find the first few terms of the series for e™, 


sin(x) 


cos(x) 


Challenge: Find the first few terms of the series for tan(x) = . (This is very 


hard to do by hand no matter how you go about it.) 


4.4 New functions 


So far, we have started with a function f(x), usually some “black-box” function 
which we cannot compute by hand, such as the sine or the exponential, and then 
found a power series representation for it. 

What if we reverse the process? We might not be able to know what the resulting 
function is, but it will give us a function defined on the region of convergence. So, 
for example, we can invent a function called Charlie. (Why not?) 


sedi. 5¢ 2n 
Charl =l+24+—+4-- . 
arlie(x) + 7 + 7 toe Qn! 
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It’s actually fairly easy to figure out a formula for Charlie(x), but let’s not. How much 
do we know just from the series? 

First of all, we can use the ratio test to find the values of x for which Charlie(x) 
is defined. I'll leave it to you: check that the radius of convergence is infinite, that is, 
Charlie(x) is defined for all x. 

Next, it’s clear that Charlie(0) = 1. That’s the one value we can be sure about. 
Notice also that all the terms involve even powers of x and there are no minus signs, 
so Charlie(x) > 0 for all x. 

For the derivative, we can also find a series by differentiating: 


3 5 
Charlie’(x) = x + a 4 a Bee 
also convergent for every x. When x > 0 this is clearly positive, and when x < 0 
it is negative, so Charlie(x) is decreasing when x < 0 and increasing when x > 0. 
That means there is a minimum point at x = 0. 

I'll leave it to you to check that Charlie’’(x) = Charlie(x). That puts Charlie 
somewhere between the exponential (which is its own derivative) and sine/cosine 
(which are equal to their fourth derivatives). 

Finally, can we plot it? Well sure; if we tell the computer to plot a long enough 
chunk of Charlie’s series, we’ll get a good picture: 


sage: f(x)=sum(x7(2*n)/factorial (2*n) ,n,0,10) 
sage: plot (f(x), (-2,2)) 


20 <5 -10. 06 0.0 0.5 1.0 1.5 2.0 
Of course, a natural question here would be who needs that function? If Charlie(x) 
is important, why isn’t there a Charlie button on my calculator? (Well, there is, but 
bear with me.) 
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The answer is that this function answers a real-world question: it gives the shape 
of a hanging chain when you hold it by its endpoints. People found that out because 
they knew some properties of the derivative of that curve, and the series can be 
derived from those properties. 

The next two sections will show you two examples of how series are used in the 
real world, sparing you the gory details but trying to make the main point: you’re 
learning this stuff because it is useful. 


4.4.1 Problems 


Problem 4.4.1: Figure out a formula for Charlie(x) in terms of standard functions 
you know. Is there a “Charlie button” on your calculator? Is there a SageMath com- 
mand? 


Problem 4.4.2: The function Si(x) is defined as 
tks 
Si(x) = aU 
0 t 


(“Si” stands for “sine integral.”) 

Notice that we do not know an antiderivative for sin(x)/x. It can be proved that 
the antiderivative cannot be expressed in terms of the usual list of functions we know, 
so that in fact Si(x) is a new function. 


a. Find the Taylor series centered at a = 0 for Si(x). 


b. I want to use the series to compute Si(1) with an error smaller than 5x 10-°. Of 
course, I want to use as few terms of the series as possible. How many terms 
of the series do I need to use? 


Problem 4.4.3: Another example of a function that does not have an antiderivative 
; ; ee 2 

that can be written in terms of standard functions is e~* . But we can always make 

an antiderivative by integrating: if 


* 2 
Fo) = [ e dt, 
0 


then the fundamental theorem of calculus tells us that F’(x) = e-*. Finda power 
series expansion centered around 0 for F(x). 


4.5 Doing things with power series 1 


Many problems in applied mathematics lead to differential equations, that is, 
equations involving a function and its derivatives. In physics, for example, we often 
express the position as a function of time: y = f(x). If so, y’ represents velocity and 
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y’’ represents acceleration. Since force is equal to mass times acceleration, we often 
end up with an equation connecting y, y’, and y”. 

Sometimes the resulting equation is easy to solve. The simple harmonic oscilla- 
tor, for example, leads to the equation y” + wy = 0 and it’s not too hard to see that 
both y = sin(@x) and y = cos(@x) are solutions. But sometimes we get something 
much more difficult to solve. When that happens, series can sometimes be used to 
find solutions. 

This section is about the equation xy” + y’ + xy = 0, which also comes up in 
physics. It turns out that there is no easy way to find the right function y = f(x) that 
makes this be true, but we can do it by using series. I'll work only on the case where 
we also know the initial value f(0) = 1. 

Suppose y = f(x). Then we are trying to find a function f(x) that satisfies 
f() = 1 and 

xf"(x) + f(x) + xf (x) =0 


for all x. Since there is no obvious function that does this, we will find a series 
representation instead. The computation takes several steps, so gird up your loins. 
Suppose 
F(x) = Cot Cx + Cyx? $e $C XT +, 
where the coefficients C,, are still to be determined. We will try to find the coefficients 


by requiring that the function satisfy our differential equation. 


Step 1: Since we know f(0) = 1, it is easy to determine Cp: just plug in zero on 
both sides. We get Cy = 1. 


Step 2: Since we are allowed to take derivatives of power series, we know that 
7) =C, +2C,x+ 3C3x7- +4 nC, x"! ee 


and 
f"(x) = 2C, +3- 203x +4. 3Cyx? + +n(n— i Gee came Pio 


Step 3: We can now use our answers to write down the series for x f’"(x) + 
f'(x) + xf (x). The algebra is annoying but not awful. We have the three series for 
f(x), f’(x), and f’"(x). Multiplying by x just changes the powers of x. 

It’s convenient to line up the series so that the terms involving the same power 
x" appear above each other. If you do that, you get 


xf(x)= Cox FCyx* + Oox? +... By ORVEy ae ee 
f(x) = Cy + 2C yx + 3C3x? + 3CyxP +... + (n+ 1)Cygx" +... 
xf" (x) = IC 843 ICM FA 3CR + (n+ (MC yx" +... 


So the sum, which is the series we want to equal zero, has n-th term equal to 


Cy 1x" + (+ IC 1x" + (8 + Dn, 41x" = [(C,_-1 + 0 + °C, 41)". 
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This doesn’t work for n = 0 (since there’s no C_,, after all), but it’s easy to see that 
the term of degree zero is just equal to C,. So the series is 


xf") + FR) + xf) = Cy + DUC + 2 + 1 Cyy rd” 


n=1 


Step 4: Since we want x f’"(x) + f'(x) + xf(x) = 0, this last series must have 
all of its coefficients equal to zero. Let’s use this to find C,, Cy, C3, Cy, C5, Ce. 
The first thing to notice is that C; = 0. Next, let’s look at the other coefficients. 
We have 
C,-1 + (n+ 1)°C,,,; =0 


for every n. To make this easier, let k = n + | and rewrite it as 
C2 os iG: = 0 


or, even better, 


Now it’s easy to use the formula to see that the fact that C} = 0 makes C3 = C; = 0, 
and in fact that all the odd-numbered coefficients are zero. 
For the others, we know Cy = 1, and we have 
1 1 1 


Gec- Gass Ga 
2 A AZT 6 4-16-36 


Step 5: The hard thing is finding a general pattern for the even-numbered coef- 
ficients! We know 


1 
C. = ak» 
however. Rewriting that for C5, gives 
Cy, = = =O 
2k ~ (2k2 C242 = 2g 2k-2° 


Since we start from Cp = 1 , the denominator for C>, will have k factors of 2 and 
then 
K(k — 1)2(k — 2)? = (KI). 


And the sign changes at each step. So 


1 
=f Tk 
Cox =D See 


Ooof. 
Remembering that C, is the coefficient of x7", the series for f(x) is 


1 n x2" 
f(x) = 1-7x + ae —-+(-1) ma no” + 
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Whenever this converges it will define a function that solves our differential equation. 
Notice that this is not any of the series we already know, so when it converges 
the series will define a “new” function, i.e., one that is not yet in our repertoire. 


Step 6: Using the ratio test, it’s easy to show that the radius of convergence is 
R=, so this series converges for every x. I’ll let you check that this is right. 


So in fact we have found a solution of our equation that is valid for every x. It is 
a function we don’t know about yet, but in fact it is a standard function that appears 
in many places. The usual name for the function f(x) is “the Bessel function of the 
first kind and of order 0.” A common notation for it is Jo. In SageMath it is called 
bessel_J(0,x). 

Let’s investigate Jp(x) a little bit. Some things are easy. We know J,(0) = 1, and 
since all the terms of the series involve even powers, we also know that Jp(—x) = 
Jo(x). We can find a series for ae) by differentiating this one term by term. 


1 1 1 1 = 
Jy(x) = a + ie” see $f orem ! oer ; 

still valid for all x. We can get a second derivative the same way. That doesn’t tell us 
a whole lot, but it does allow us to check, for example, that Jo(x) has a maximum at 
x = 0. On the other hand, the series doesn’t allow us to find any other critical points, 
since we do not know how to solve Ji(x) = 0. 

We can see what the graph of Jp(x) looks like (at least approximately) by plotting 
enough terms of the series. We don’t really know how many terms are enough, so 
we need to experiment. Let’s try going until n = 20, so until the term of degree 40. 


sage: ff(x)=sum((-1)“n*x7 (2*n)/(27 (2*n) *factorial (n)~2) ,n,0,20) 
sage: plot(ff(x),(-10,10)) 


Here is the graph: 


-10 
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This looks something like a cosine function, but the amplitude is decreasing as 
we move away from x = 0. It’s hard to see from this graph whether the peaks and 
troughs are getting father away from each other or not. 

Since SageMath knows Jo(x) itself, we can also ask it to plot that. The result 
looks a lot like the picture we just saw, not surprisingly. 


sage: plot (bessel_J(0,x),(-10,10)) 


-0.4 J 


We should plot both the series and the function at once; unfortunately, between 
—10 and 10 the two look identical! So instead let’s look from —50 to 50: 


sage: plot([bessel_J(0,x) ,ff(x)], (-50,50) , ymax=1, ymin=-1) 


-1.04 
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So the approximation is quite good in the middle of the graph, but to get a bigger 
range of x we would need to work with more than 20 terms. 

There is more than you would ever want to know about Jo (and its friends and 
companions J,,) at Wolfram MathWorld. And even more at the National Institute of 
Standards’ database of mathematical functions. (Is it surprising that a government 
agency has a database of mathematical functions? You might enjoy looking around 
that site.) But the point for us is just that we have found a solution to the differential 
equation using series, and it’s a useful function in the real world. 


4.6 Doing things with power series 2 


One of the cool things that can be done with power series is to use them to 
encode sequences of numbers that we happen to be interested in. For example, the 
power series for e* encodes the numbers 1/n!. Here are two examples from real 
(mathematical) life. 


4.6.1 Fibonacci numbers 


You have probably heard of the Fibonacci numbers: 


fo=l f, =1, fa =SFnatFn2 for all n > 2. 


We can find them step by step: each time we add the last two numbers to find the 
next one: 
1, 1, 2, 3, 5, 8, 13, 21,35, ... 


This sequence has all sorts of interesting properties and even shows up in nature 
sometimes. 

How can we study these numbers better? Well, one standard trick is to use them 
to create a new function: 


F(x) =} fax” = 14x + 2x? + 3x3 + 5x4 + 8x5 + 132% 4... 
n=0 


If we can figure out things about F(x) then we may get information about the num- 
bers f,,. 

But it’s actually easy to find a formula for F(x). The key thing to notice is that 
multiplying by x shifts all the coefficients. So 


co co 
xF(x) = YP fy = LY fy xk = x +2? + 2x3 + 3x4 + 5x? + 8x° + 13x7 +... 
n=0 k=1 


and 


co ce 
xP F(x) = Yr fyx? = LY) fy xk = x? x3 + 2x4 + 3x? + 5x94 8x7 + 1328+... 
n=0 k=2 
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This sets us up to use the recursion f, = f,_, + f,_2: if we add the last two series 
we get 


xF(x) +x° F(x) =x+ Vet Bie =x+ Sie = F(x)-1. 
k=2 k=2 


Solving for F(x) gives 
F(x) — xF(x) — x°F(x) = 1 


and so i 
F(x) = : 
m2 l1-x- x? 
That’s remarkable! The entire Fibonacci sequence is therefore hidden in the function 
a 
l—x-— x? 


It’s kind of amazing. 

We can do a lot more from here. For example, if we factor 1 — x — x? and then 
use the “partial fractions’ method from integration, we can write F(x) as the sum 
of two geometric series, whose convergence behavior we know well. We can then 
use this to get lots of information about the Fibonacci numbers. For example, we can 
find a non-recursive formula for /,,. 


4.6.2 Generating functions 


The process we have just gone through is known as finding a “generating function” 
for the sequence f,,. This idea is used all over mathematics. If we have a sequence 
we like, say 

dg, 41,49, ..- 


we can build a generating function by using it as the coefficients of a power series 
ag t+ ayx + Gk SF aah 


Sometimes we use the a, slightly differently, putting factorials in the denominator; 
this gives the “exponential generating function” 
a3 x3 ag 4 
ag tayx + sx a 31~ + The Pass 
If we can figure out a formula for either version, we can find out a lot more about the 
an 
Here’s an example from probability theory. We often study probability distribu- 
tions via their “moments.” There is one moment M,, for each positive integer n. Of 
course we pack all the moments into a generating function, in this case an exponential 
generating function 
M M. 
3.3 444 


1+M aad 
+ tte? State Soa 
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This is known as the MGF of our probability distribution (for “moment-generating 
function’), and one proves that if its radius of convergence is nonzero the MGF com- 
pletely determines the probability distribution. So the MGF becomes a very powerful 
tool in the theory: if we want to prove that some distribution is what we think it is, 
it’s enough to show that the MGFs are the same. 

There is a famous book about all this called Generatingfunctionology, by Herbert 
S. Wilf. It explains many of the tricks involving generating functions. It’s not an easy 
book, but if you’re interested in taking a look it can be found online. 


Check for 
updates 


5 Distant Mountains 


Looking at the horizon we see mountains in the distance. This chapter is a brief 
invitation to visit the mountains and explore. I have given few of the technical details 
and no proofs at all. Instead, I point out some interesting bits of scenery. 


5.1 Complex numbers 


As you probably know, complex numbers are numbers of the form a + bi with 
a and b real numbers and i being a symbol such that i27 = —1. We add and multiply 
them using the usual rules plus the rule that i = —1. For dividing, we use an extra 
trick: when we multiply c + di by c — di we get c* + d*, which is a real number. 
Here are the formulas: 


(a+ bi) +(c+di) =(a+c)+(b+d)i 
(a + bi)(c + di) = (ac — bd) + (ad + be)i 
at bi _ (a+ bi)(c — di) 2 ac + bd _ ad ~ be, 
et+di c? + d? c2+d2 ¢24+d2— 


So, to divide, we first multiply numerator and denominator by c — di and then work 
out the rest. 

A silly remark that may help avoid confusion: since bi and ib are the same thing, 
we can write a + ib as well. We often do this when b is a complicated symbol. So 
you will usually see i sin(x) instead of sin(x)i. A complex variable is usually written 
as z= x + iy, with the convention that x and y stand for real numbers.! 

If we have a complex number a + bi, we call a its real part and b its imaginary 
part. The usual notation is Re(a + bi) = a and Im(a + bi) = b. Notice that both real 
and imaginary parts are real numbers. 

An important operation with complex numbers is called complex conjugation, 
denoted by a bar: if z = a+ bi then z = a — bi. One of the reasons conjugation is 
important is that it interacts well with the operations: 


Z+W=Z+W and ZW =ZW. 
There is also a way of measuring the “size” of a complex number: 


|a+ bil = Va2 + B?. 


'T don’t know why I tend to write a + bi when it’s a and b and x +iy when it is x and y, but that’s what 
my fingers want me to do. 
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I call this the absolute value of a + bi, but some people prefer to say “modulus” 
instead. Notice that if z = a+ bi we have 


zz=a +h? =(z/7. 
In particular, it follows that |zw| = |z| |w]. 
SageMath can work with complex numbers. It uses I for our i, but otherwise it’s 


just as you would expect. Do these by hand to check that SageMath gets them right: 


sage: (2+3*I)+(1-8*1) 


-5*I + 3 
sage: (2+3*I)*(1-8*I) 
-13*I + 26 


sage: (2+3*I)/(1-8*I1) 
19/65*I - 22/65 

sage: abs(3+2*I) 

sqrt (13) 

sage: conjugate (3+2*I) 
-2*I + 3 


We can visualize complex numbers on a plane: just label the point (a, b) with the 
complex number a + bi. Let’s plot the point 2 + i. 


2.04 


1.54 


Te a ores * 


0.5 4 


0.5 1.0 1.5 2.0 2.5 3.0 


For obvious reasons, we then call the x-axis the “real axis” and the y-axis the “imag- 
inary axis.” 

The basic operations all translate to plane geometry. By the Pythagorean theo- 
rem, the absolute value |a + bi| = Va? + b? is just the length of the segment from 
(0, 0) to (a, b). Adding two complex numbers is just the usual addition of vectors 
using the parallelogram law. One can interpret the product geometrically as well, 
but it’s a little more complicated. 
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Since we can add and multiply complex numbers, it makes sense to plug them 
into a power series! And since we can measure how big a complex number is, we 
can make sense of convergence too (in fact, using exactly the same words as before). 
So, for example, we can define something like e’” to be what we get when we plug 
in iz into the usual power series for e*. 


Let’s try it. It helps to work out the powers of i first. Since i? = —1, we have 
i> = —i and i+ = 1. And then 7? = i starts the cycle again. So the powers of iz will 
be 

1, iz, —x, ~in’, x’, im... 
Plugging those in, 
eG | Jee er 2 ee 
1 
=l1l+ia+ 7 ++ 31 ++ 7 51 
2 3 4 5 
ade eae ge 


where at the end we grouped all the terms without an i in one parenthesis and all the 
terms with an i in the other. If you stare at that, you’ll see that the first parenthesis 
ends up being what we get when we plug x = z in the series for cos(x), while the 
second parenthesis is what we get when we plug in x = z into sin(x). So both series 
converge and we have 

e'” = cos(x) + isin(x) = —1, 


which we can also write as 
7 4+1=0. 
Some people call this “the most beautiful formula in mathematics.” 
In fact, our computation would have worked just as well for e’”, giving us e'” = 


cos(y) + isin(y). Multiplying both sides by e* we get a formula for the exponential 
of any complex number: 


e® = e*t!Y = eX(cos(y) + i sin(y)). 


Once complex numbers are in play, then, there is a deep link between exponentials 
and trigonometric functions, which we discover using power series. 

All of our theory works just as well—maybe even better—with complex num- 
bers. In fact, every single one of the theorems we found for real power series is also 
true for complex power series. And the proofs are pretty much the same as well. The 
only one of our proofs that depends on working with series of real numbers is the 
theorem about absolute convergence. That theorem is true for complex numbers too, 
but the proof has to be different. 

One pleasant thing about working with complex power series is that the radius 


of convergence is really a radius! After all, |x +iy| < R translates to \/x2 + y2< R 
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and so x* + y* < R?, which says that (x, y) is inside a circle of radius R. The big 
difference is that where before we had only two “endpoints,” namely, x = —R and 
x = R,now we have the entire circle x?+ y? = R?, and while the series will converge 
inside the circle, the behavior at the boundary can be very complicated. 

Sometimes the complex world affects what happens with real numbers. Here’s 
an elementary example. We saw that 


1 
1+x2 


6 


a a 2s when |x| < 1. 

That’s kind of strange, since the left-hand side is well defined for every real number 
x. Why does the series only work for |x| < 1? Well, in the complex world, if x = i 
we have 1 + x* = 0, so the function is not defined. Notice that |i] = 1. It turns 
out that a power series will always have the same radius of convergence whether we 
plug in real or complex numbers. So the discontinuity at x = i forces the radius of 
convergence to be at most R = | even if we are only working with real numbers. 
Somehow the real series “knows” about the problem when x = i. 

There are many very nice things about complex numbers. For example, the equa- 
tion e’” = cos(y) + isin(y) allows us to move from trigonometric functions to expo- 
nentials. Exponentials are much easier to work with! Many trig identities become 
very easy once we translate them to exponentials. 


5.1.1 Problems 


Problem 5.1.1: We pointed out above that if a is some complex number and R is a 
positive real number the inequality |z — a| < R defines an open disk with center a 
and radius R. Describe the regions defined by the formulas 


a. Re(z) > 1, ce. |z—-1| =|z-il, 


b. Im(z) > 0, d.z=2z. 


Problem 5.1.2: It’s fairly easy to see that z+w = Z + w. Check that zw = Zw. 
Explain how it follows that |zw| = [z| |w]. 


Problem 5.1.3: If z and w are complex numbers, is it still true that e7+” = e7e'”? 
(Consider the first challenge problem on page 101.) 


Problem 5.1.4: Use the fact that e!+'? = e’9 e!” to get formulas for cos(@ + @) and 
sin(@ + @). 


Problem 5.1.5: Since adding two complex numbers is just vector addition on the 
plane, we can try to visualize a series the following way. Take a series 


Cotcy tegte+t+e,+... 


where the c; are now complex numbers. Start at (0,0) and draw the segment to the 
point representing cg. Now draw the vector corresponding to c, with its tail at the 
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point cp. The head will be at cg + cj, the first partial sum. Keep going. You will get 
a chain of vectors creating a kind of path in the plane. If the series converges, you 
will see the path approximate the limit. 


a. Try this on the series for e!”. 
b. Try it for the geometric series f(z) = 1/(1 — z) and z = i/2. 
c. Try it for f(z) = 1/1 — z) and z = (1+ i)/2. 


d. What happens if you try it for a value of x for which the series diverges? 


5.1.2 Notes 


The fundamental role of power series in the theory of functions of a complex variable 
is a major theme of that theory. You can find a reasonably gentle account of this in 
[19, Chapters 25-27]. My two favorite books on the subject are [8] and [3], but 
reading them will take serious effort. 


5.2 Series in general and the idea of uniformity 


Power series are very useful, but we could certainly try other kinds of series. 
Sometimes we can use our theory to understand their behavior, sometimes we need 
new theory. 

Here’s an easy one. In problem 2.9.5 we looked at 


oo: n 

SOs 
x 

n=0 


This is just a geometric series, since we are adding powers of 3/x. So we know that 
it converges when |3/x| < 1, which just says |x| > 3. Convergence happens if we 
are far away from zero instead of in an interval centered at zero. 

In this case, we also know the sum. 

14243424 ea aes. His, 
1-3/x x-3 
This is a fairly artificial example, since it’s just a modification of a known series. But 
once things get more complicated it is much harder. 

It turns out that considering series in general brings up issues that do not trouble 
us in the case of power series. This section focuses on one particular example of the 
most important issue. 

We will work with this series: 


~_ - sin(3x) be sin(4x) 


f(x) = sin(x) + —— 3 7 
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This is an example of a Fourier series, which we will discuss in the next section. We 
can ask the standard questions: does it converge, and if so what is the sum? 

For each value of x, we get an infinite sum of numbers, which we can then study. 
When that series converges, we get some value f(x). So the more precise question 
would be: for which values of x does that series converge and to what? 

That is hard to determine with bare hands, but we can use the theory of Fourier 
series to find the answer. The one obvious fact is that the series converges when 
x = 0, since every term is 0 in that case. (The same happens for all integer multiples 
of z.) It’s also easy to see that since the terms all involve the sine function, everything 
will just repeat in intervals of 27. Once we know what happens between —z and x 
we know everything. 

It turns out, in fact, that the series converges for every x. The sum f(x) has 
to be defined piecewise: we already know f(0) = 0; for —a < x < 0, we have 


f= F(x +2); for0 <x <zitis f(x) = $(x — x). For other values of x we 


have to shift things by multiples of z. Let me plot that function from —z to a (blue) 
together with what we get by summing 10 terms of the series (green). 


Several things here are very different from the power series we are used to. The 
most obvious one is that the sum is not a continuous function! In fact, 
lim f(x)=-2, ——f(0) = 0, lim fio) =2. 
x30- 2 (x04) 2 
Every function given by a power series is not only continuous, but in fact infinitely 
differentiable. So this is very different behavior. 
Since the series is made with sine waves, the partial sums are always going to 
be wavy. But since the limit is a function made up of straight lines, we expect the 
waviness to get smaller and smaller. 
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Near zero the function needs to make a big jump from the left-hand limit —7/2 
through 0 to the right-hand limit 2/2. As if it was getting ready to jump, the graph 
starts to oscillate more and overshoots slightly, dipping below the limit before the 
jump and reaching above it just after. 

To see that something strange is happening let’s make a longer partial sum. Here 
is what we get by adding 100 terms. The jump is so close to vertical that it disappears 
behind the y-axis, but we can see the rest. 


Away from zero we pretty much just see the straight lines. But here’s a weird thing: 
near zero, the wavy part has gotten much narrower, but the size of the overshoot is 
almost the same. We can prove that it never goes away! 

Imagine that we are adding more and more terms of the series. You are watching 
the results from a point near zero, say x = 0.3. At first you are in the wavy part, but 
as we add more and more terms the wavy part gets thinner. Eventually, it gets so thin 
that it doesn’t affect you at x = 0.3; from then on you are pretty much on the straight 
line. So the series converges to f (0.3), as it should. 

But no matter how many terms you add the overshoot continues to exist, affecting 
points much closer to zero. So no matter how many terms we add, at some points the 
partial sums will not be really close to the function. 

One way to describe what is going on is this. Suppose you want to get within 0.05 
of the true value of f(x). Let’s choose a small window, say —0.2 < x < 0.2, and plot 
f(x) + 0.05 and f(x) — 0.05 in red. We could call the values between f(x) — 0.05 
and f(x) + 0.05 the Goldilocks Zone. In green, we plot a really big partial sum, say 
1000 terms. This is what we get near x = 0: 
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WA wwe 


1.5 Reiter 


1.0 


0.5 


For most values of x the partial sum is inside the Goldilocks Zone, but the overshoot 
still happens for x very near zero. If you want to make the difference less than 0.05 
for some chosen very small x, you need to add even more terms of the series. But 
for x even closer to zero the overshoot will still exist. So the closer to zero you get, 
the more terms you need to get the partial sum to fall within the zone. 

With power series, this never happens. If you focus on an interval -a < x < a 
inside the region of convergence, you can force the entire partial sum into a band 
surrounding the sum. This is described by saying that the convergence is uniform in 
that interval: you can make the difference between partial sums and the final sum 
small everywhere at once. In our series that doesn’t happen. The convergence is not 
uniform in the interval —z < x < a. We Say that the series converges pointwise, for 
each value of x, but not uniformly. 

There is a huge difference between uniform convergence and pointwise conver- 
gence. For example, if all the terms are continuous functions and the convergence 
is uniform, then the sum is continuous too. But, as in our example, if we only have 
pointwise convergence it is perfectly possible for the sum to have jumps. Similar 
problems appear when we try to integrate or differentiate series. We can do it for 
power series because the convergence is always uniform in every closed interval. 
But for more general series we need to check whether the convergence is uniform. 


5.2.1 Problems 
Problem 5.2.1: We noticed above that 


ie ee ee, 
x x2 x3 ~ 1-3/x x3 , 
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What does the Taylor series centered at a = 0 for this function look like? Where 
does it converge? 


Problem 5.2.2: (A nontrivial example.) For which x does the series 


foo} 


> x 
n(1 + nx?) 


n=1 
converge? Is the convergence uniform? 


Challenge: Here’s a strange one. Look at the series 


1+x 2x 2x? 2x4 
~ ~ ~ 


+... 
l-x x2-1 x4-1 x8-1 


where the exponents are powers of two, so the numerators are 2x?" and the denom- 
: 1 : ; : 

inators are x2"" — 1, If you plot a partial sum, you will see that the series seems to 
converge to | when |x| < 1 and to —1 when |x| > 1. Can you prove that? 


5.2.2 Notes 


The theory of uniform convergence is one of the important topics in any course in 
real analysis. In particular, you can find careful definitions and proofs in most real 
analysis textbooks. The “overshoot” we saw in our example is known in the theory 
of Fourier series as the Gibbs phenomenon. 


5.3. Periodic functions and fourier series 


Power series are the simplest kind of series representation for functions, but it can 
be argued that the most useful representation is the one given by Fourier series. The 
example in the previous section is a Fourier series, so we already know that they are 
harder to work with than power series: the convergence is not always uniform. Since, 
however, Fourier series are so useful, in this section we give a very brief survey of 
the main ideas. 

This kind of infinite series is named for the French mathematician Joseph Fourier. 
He was working on the problem of how heat moves through a solid metal plate. 
The problem led to a complicated differential equation that was easier to solve if a 
key function looked like sin(nx) or cos(mx) for some integer n. Solutions could be 
added (or “superposed’’) to give a new solution, so the problem could be solved for 
functions like 

Cc, Sin(x) + c, cos(2x) + c3 sinG3x), 


where c,,¢>,c3 are numbers. Fourier decided to think about infinite sums of that 
kind, and asserted that any function would be equal to a sum of such a series. Using 
that, he could solve the problem in general. 

But was it true? What kind of functions are included in that “any”? What kind of 
convergence does one get? If we can write a function as a Fourier series, how do we 


120 A Short Book on Long Sums 


find the coefficients? There was a lot to investigate. This sections gives you a quick 
glimpse of the answers. 

We will be working with series whose terms involve sines and cosines, so the 
results will be periodic. So we start with a function f(x) which is periodic. To sim- 
plify things,” let’s suppose the period is 27, as for the sine and cosine, so that we 
are working with functions such that f(x + 272) = f(x) for any x. We only need to 
know what f(x) is for —z < x < a, since that determines the entire function. We 
will often just give a formula valid for —az < x < a and assume that the rest of the 
function is determined by the rule f(x + 22n) = f(x) for every integer n. This is 
called “extending by periodicity.” 

The kind of series we want to find looks like this: 


J (X) = ag + a, cos(x) + by sin(x) + az cos(2x) + by sin(2x) +... 


where the a, and b,, are numbers. Notice the convention of using a,, for the coefficient 
of the cosine and 5b, for the coefficient of the sine. Since cos(0) = 1, the constant 
term gets called ay. (Warning: In many texts the constant term is + dy, for reasons 
we touch on below.) 

We could write the series in summation notation too: 


f(x) = ay + Y'(a, cos(nx) + b, sin(nx)). 
n=1 


So our model functions are sines and cosines and we are trying to express general 
periodic functions in terms of those. 

One way to understand this is to notice that each chunk has a shorter period, and 
so a higher frequency. 


Two waves and their superposition. 


7It isn’t hard to modify the theory to allow for other periods, but we will ignore that issue. 
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@ dy is constant. In the example above we let ag = 0. 
@ a, cos(x) + b, sin(x) is a wave with period 27. 

The blue graph shows ; cos(x) + sin(x). It has period 2z. 
@ a, cos(2x) + by sin(2x) is a wave with period z. 


The green graph is ; cos(2x) + ; sin(2x), which has period z, so higher fre- 
quency. The red graph is the sum of blue and green, which has period 2z. 


e a3 cos(3x) + b; sin(3x) is a wave with period 27/3. 


I didn’t add a wave of period 27/3 to the graph because it got too messy. But 
it’s easy to play with SageMath if you want to. 


e And so on: a,, cos(nx) + b,, sin(nx) is a wave with period 2z/n. 


So we can think of each term as being an oscillation with a certain frequency, and 
we are trying to break up a more complicated oscillation into simpler ones with 
higher and higher frequency. The more complicated wave f(x) is therefore the sum 
of many simpler ones. The simple pieces are often called the “harmonics” of the 
more complicated oscillation. 

When the vibration we are studying is a sound wave, all of this is literally true. 
The overall sound is created by superposing a whole bunch of simple sine and cosine 
waves. The overall sound of an instrument is determined by how these simple pieces 
are put together. Sometimes it’s even possible to figure out which instruments were 
used to create a certain sound by studying the harmonics.° 

We saw an example of this in the previous section: the sum of sines 


sin(2x) | sin(3x) _ sin(4x) 
5) + 3 Si 4 ee 


sin(x) + 


converged to 


3Tm 2008 a mathematical analysis of the opening chord of the Beatles’ “A Hard Day’s Night” made the 
news. The technique used was a generalization of Fourier series. 
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That plot only shows the picture from —z to z, but things involving sines are always 
periodic, so the picture will repeat in intervals of length 27, so the full picture of the 
sum is the “sawtooth” function 


SageMath draws vertical lines at the jumps; mathematics professors usually pre- 
fer dotted lines. 

Of course, we want to do this in the opposite direction: from the complicated 
oscillation f(x) we want to somehow extract the coefficients a, and b, that will 
produce a series converging to f(x). Somehow the sawtooth function should tell us 
to take a, = 0 and b, = 1/n. 

In the case of power series, we did this using derivatives and plugging in zero. 
That can’t work here, since cos(0) = 1: plugging in zero just leaves us with the entire 
cosine part. No fun. Plus, our functions don’t even need to be continuous, so they 
may not have derivatives for us to work with. 

Amazingly, the trick is to use integrals. You may remember that integrating sine 
and cosine over a whole period always gives zero. Indeed, for any integer n > 0 we 


have : ; 
/ cos(nx) dx = i sin(nx) = 0. 


So if we integrate the equation 
f(x) = ag + XYG@ cos(nx) + b, sin(nx)) 
n=1 


term by term from —z to z, we get zero for all the terms with n > | and are left with 


[soos | ag dx = 21a. 


(Well, we get that assuming that we are allowed to integrate this kind of series term 
by term. That may not be true. But let’s just plow on and hope for the best.) 
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If you believe that last formula, you can now see how to get the constant term: 


ag = x / f(x) dx. 


How about the other terms? Well, more integrals. It turns out that if m and n are 
positive integers we have 


/ ere {' ear 


a ax ifm=n 


a cos(nx) cos(mx) dx= ifm # n 


7 ax ifm=n 


/ sin(mx) cos(mx) dx = 0. 
(If you are good with integrals and trig identities, try to prove those.) 

If we multiply f(x) by either sin(mx) or cos(mx) we get a series with terms that 
look like sin(x) sin(mx) and friends. If we then integrate, most of the integrals are 
zero. Specifically, the only nonzero integrals is the one (only one!) where the trigono- 
metric functions are the same and n = m. For example, if we integrate f(x) cos(3x), 
then the only term in the sum with a nonzero integral will be a3 cos(3x) cos(3x), 
with all the others having integral equal to zero. 

The upshot (work out the details if you want to) is that, for any integer n > 1, we 
have 


an = Lf f(x) cos(nx) dx 
UT S—n 


and 


b= A is f(x) sin(nx) dx. 
a —-t 


If we can do all those integrals (a big if!) then we end up with a candidate for a 
trigonometric series for our function f(x). It is called the Fourier series of f (x). 

(Notice that in the formula for ag we divide by 2z and in the other formulas we 
divide by z. This is why some books prefer to have 


ay == / soddx, 


which forces them to write the series starting with 54.) 

Fourier originally claimed that the Fourier series of any function would converge 
to that function, but it wasn’t at all clear what he meant by “any” nor what kind of 
convergence he had in mind. This helped force mathematicians to give a careful 
definition of “function,” after which it was clear that “any” is wrong. After all, if we 
can’t integrate f(x) sin(nx) then we can’t even find the coefficients. 
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The eventual conclusion was that the Fourier series of a nice enough function 
J (x) will converge to f(x) for almost all values of x. Sorting out the meaning of that 
sentence occupied mathematicians for a century after Joseph Fourier. What proper- 
ties make a function “nice enough”? What does “almost all” mean? When is the con- 
vergence uniform? Answers were eventually found, but they required more advanced 
mathematics. If you want to know more, look at the references we give below. All I 
will do here is show you an example. 

Since we will need to compute integrals, let’s choose a really simple function. 
We'lllet f(x) = x for—az < x < wand then extend it periodically using f(x+2an) = 
f(x). This f(x) is another kind of “sawtooth” function: 


Let’s compute the a, and b,, coefficients. The first group is easy. Remember that 
the integral of an odd function in an interval from —b to b is always zero. Since 
f(-x) = -—x = —f (x) in the interval from —z to z, we have 


[ feoax=o. 


and so dy = 0. Since cos(—x) = cos(x), multiplying our function by cos(nx) gives 
another odd function, so we see that a, = 0 for all n > O. So all the coefficients of 
the cosines are zero, i.e., the series will only involve sine functions. 

Finding the b,, is a bit harder because we actually need to integrate x sin(nx). The 
key is to remember integration by parts. Notice that the antiderivative of sin(nx) is 


Distant Mountains 125 


-1 cos(nx). 


[ f(x) sin(nx) dx = J xsinins dx 
= je + J [ cos(nx) dx 
n __ Uda 
= =e cos(nz) + 0 
n 


20 
= =| n+1 Sos, 
(-1) 
(Notice that cos(mz) = (—1)” for any integer n.) Since we need to divide by z to get 
the b,,, we end up with 


2 
b= Ce 
This gives the series 
: : Ds Dea 
2 sin(x) — sin(2x) + 3 sin(3x) — z sin(4x) +... 


When x = +z this is clearly equal to zero while we had f(x) = +a. So there is no 
convergence at those points. 

But for all other values of x the series turns out to converge to f(x). Here’s a plot 
of what we get by adding the first 100 terms: 


We see that the issue we noted in the previous section happens here as well, but 
away from the jumps everything is working as expected. We have decomposed our 
sawtooth as the sum of infinitely many sine waves. 
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5.3.1 Problems 
Problem 5.3.1: What happens if you try to find a Fourier series for f(x) = sin(x)? 


Problem 5.3.2: The “square wave” is the function equal to —1 when —z < x <0 
and equal to 1 when 0 < x < a. Find its Fourier series. 


Problem 5.3.3: Let f(x) = x? for —a < x < a extended periodically. Find the 


Fourier series for f(x) and plot a partial sum to see if it looks like f(x). 


5.3.2 Notes 


Fourier series are a huge subject. Many calculus books give a brief introduction to 
them, but giving a detailed account with proofs requires knowing more. Most books 
on real analysis provide the basic theory. David Bressoud’s A Radical Approach to 
Real Analysis [7] uses the questions raised by Fourier’s claims as the main source 
of motivation. My favorite references for Fourier analysis are [14] and [20], but ’'m 
a mathematician. There are many other options ranging from very applied to very 
theoretical. 


5.4 Dirichlet series 


This final glimpse of a distant mountain is very different. Power series and Fourier 
series are fundamental tools of applied mathematics and engineering. Dirichlet series 
are mostly of interest to mathematicians (and some physicists). They are particularly 
important in number theory, which is why I love them so much. 

You have already met a Dirichlet series. We looked at series whose terms are the 
reciprocals of powers of n. 


When x = | this is just the harmonic series, which we know diverges. These series 
converge when x > | and diverge when x < 1. (You solved all the problems in 
Section 3.7.1, right?) We really only wrote them down for integer values of x, but 
everything works fine for other real numbers x as well, as long as x > 1. So we can 
think of the sum of this series as a function of x: for each x > 1, we define 
(walt otetetet. 

The squiggle ¢ is a zeta, the Greek letter corresponding to z. It looks like a z whose 
diagonal stroke grew too big. 

People have studied ¢(x) since the eighteenth century. One of Leonhard Euler’s 
first mathematical successes came when he found out that €(2) = n /6. He eventu- 
ally found a formula for the value of ¢(x) whenever x was an even integer. He tried 
very hard to get a formula that worked for odd integers, but was never able to. 
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The zeta function has many interesting and mysterious properties. Let me just 
mention one surprising thing about it. We can write €(x) as a product: 


io 14" Ly" i4- 
c= (I-52) (i-ge) (ge) ge) ~ 

What we have on the right-hand side is an infinite product, with one factor for each 
prime number. (Yes, Virginia, there are infinite products as well as infinite sums.) 
This means that the zeta function is somehow hiding information about prime num- 
bers inside it. For example, letting x approach | in this formula gives a proof that 
there are infinitely many prime numbers. 

To get a general Dirichlet series, we replace the ones in the numerator with some 
other sequence. In general, a Dirichlet series looks like 


If we can control how fast the numerators a, grow, we can decide for which range 
of x this converges, and so get a new function. The region of convergence is always 
a half-line x > something. 

Here’s an easy example. For each positive integer n, let d(n) be the number of 
positive divisors of n.So d(1) = 1, because 1 is the only divisor of 1, d(2) = d(3) = 2 
and similarly for every prime number, but d(4) = 3 because 1, 2,4 are all divisors 
of 4. Now look at 

y dm _,,2,2,3,2,4 


ey hae ag ae Fe 


n=1 


Since d(n) is quite small compared with n, it turns out this also converges whenever 
x > 1. The sum is actually ¢(x)?; can you see why? 

The real usefulness of these series, however, requires allowing x to be a com- 
plex number. In that case the region of convergence is always a half-plane Re(x) > 
something. For example, the series for the zeta function converges for all complex 
numbers x with Re(x) > 1. 

It’s over the complex numbers that things get really interesting... but it’s also 
hard, and this short book is already getting too long, so Pll leave you with that. 


5.4.1 Notes 


Dirichlet series are discussed in books on analytic number theory, such as [1] and 
[21]. Most such books are hard going and assume at least some knowledge of com- 
plex analysis. There have, however, been several attempts to explain the zeta func- 
tion. I like [9] and [17]. The zeta function features as one of the “Millennium Prob- 
lems” (see [10]), surely the hardest way to make a million dollars. 


A SageMath: A (Very) Short 
Introduction 


Mathematical software is part of the standard toolkit of mathematicians. Throughout 
this book, I will use the program called SageMath [18] (sometimes called Sage, but 
there are various other software systems called “Sage’’) to do computations. The goal 
of this appendix is to provide some basic orientation on the program, including where 
to get it. SageMath is free; it can be used online or installed on your own machine; it 
is very powerful, and it takes some time to learn. SageMath incorporates the func- 
tionality of many other free mathematics programs, including Maxima, Octave, R, 
GAP, and GP. SageMath runs either in a browser window or a terminal. 

Of course, there are other mathematical software tools. Desmos, Geogebra, Maple, 
Mathematica, and MATLAB are well known and quite powerful. The people who 
make Mathematica are also behind Wolfram Alpha, which can do lots of mathemat- 
ics as well. Many students appear to be familiar with Desmos and Geogebra. These 
can all do some of the things we need to do, but I have decided to focus on SageMath. 


A.1 General information 


SageMath is an ambitious attempt to create powerful mathematical software that 
is free and open source. The ultimate ambition, says the Sage home page, is to create 
“a viable free open source alternative to Magma, Maple, Mathematica and MAT- 
LAB.” (Id say that they are very close to that and in some aspects well beyond.) The 
development approach emphasizes openness: while William Stein is the leader of 
the team, contributions have come from across the mathematical community. 

SageMath can be used through a web interface, without needing to download 
and install the program. There are two ways to do that: either SageMath Cell or the 
more elaborate interface based on projects and notebooks offered by CoCalc. Most 
of my students find SageMath Cell is sufficient for everything they need to do. 

It is also possible to download and install the program on your own computer. 
When you do, you can run SageMath in a terminal window or you can run it in your 
browser. The latter is much like using CoCalc, but the program is running on your 
local machine. 

Of the two web interfaces, SageMath Cell is particularly easy to use for small 
computations. It presents the user with a big blank rectangle where one can type in 
Sage commands. Below is a button labeled “Evaluate,” which does exactly that. The 
output appears below. The downside is that SageMath just evaluates what is in the 
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box. It will not remember what you did, so if you define a symbol and then press the 
“Evaluate” button, you cannot use it again without repeating the definition. There are 
two very useful features of SageMath Cell that deserve note. First, it works on a tablet 
or phone. Second, because SageMath incorporates other open-source mathematical 
software, SageMath Cell will also allow you to use those: right under the big blank 
rectangle, there is a “Language” drop-down that you can use to do this. 

The SageMath home page offers a link to create a CoCalc Instant Worksheet, but 
if you plan to use it regularly and want to save your work from one session to the 
next you should create a (free) account. (There are also paid CoCalc accounts if you 
want/need more features.) Once you log in to your account, you can create projects, 
and each project can contain many notebooks. Notebooks allow you to enter lines of 
Sage code, which are evaluated when you hit Shift-Enter. Definitions and results are 
remembered within each session. 

If you are going to use the program a lot, then you might want to download and 
install it. It takes quite a bit of space, but having it on your own machine avoids 
connectivity issues. 

The central web site for SageMath is www.sagemath.com. You can go there to 
download the program, find documentation, and so on. 


A.2_ First examples 


It’s time to show you some examples. You can find many more in A Tour of 
Sage, which is available online. SageMath is built using the Python programming 
language, so its syntax is similar to Python’s. 

Everything is done by entering a command and then asking SageMath to evaluate 
it. In SageMath Cell, you can enter a sequence of commands as well. If you enter 


3 +5 
into SageMath Cell and hit “Evaluate” you will get 
8 


in the results box. SageMath Cell typically only prints your last result. If you want 
to see two results, it’s best to ask it explicitly: 


print (3+5) 
print (57.17100) 


Then you get two lines: 


8 
4.60904368661396e175 


(In the last answer, e175 means x10!7°.) You can get some space between the two 
lines of output by adding print (" ") between the two commands. 

In the terminal window, SageMath gives you a prompt and you type in your 
commands and then hit enter. The same example looks like this: 
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sage: 5+3 

8 

sage: 57.17100 
4.60904368661396e175 


Most of the examples we give have been done in a terminal window (for one thing, it 
is easier to cut and paste from the terminal), so you will see prompts and responses. 

The CoCalc or Jupyter Notebook interface is like the terminal, but it allows for 
graphical output. You enter commands and hit shift-enter to execute them. If you 
define an object with acommand like v=vector ([1,2,3]) in CoCalc, the terminal, 
or a notebook running on your own computer, SageMath will remember what v is 
so that you can use it again later. 

SageMath requires you to declare that a letter is a variable before using it that 
way. The only variable it knows by default is x. So, to work with functions of two 
variables x and y you will need to begin with 


sage: var(’?x y’) 
(x, y) 


If you use a variable, say d, that hasn’t been defined, you get an error message. 
SageMath error messages are usually long and cryptic, but the key information is 
usually at the bottom; this one will end with something like “name ‘d’ is not defined.” 
(A brand-new feature of CoCalc allows you to ask ChatGPT what is wrong with your 
code.) 

Here is one way to define a function (everything in SageMath can be done in lots 
of different ways). 


sage: f£(x)=x74 


sage: f 

x |--> x74 
sage: f(x) 
x74 


The command f (x)=x74 produces no output, but SageMath has been told what you 
mean by f. Notice also that SageMath distinguishes the function f (the rule that 
sends x to x*) from £ (x) (the value of f when you plug in x). All the grown-ups do 
this, but I haven’t done it in this book. 

You can compute limits. 


sage: f£(x)=(x-2)/(x+1) 
sage: limit (f(x) ,x=1) 
-1/2 

sage: limit (f(x) ,x=o00) 
1 

sage: limit (f(x) ,x=-1) 
Infinity 
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Notice that you can use oo (two lowercase os) to mean infinity. 
Two ways to compute a derivative: 


sage: diff(f,x) 


x |--> 4*x73 
sage: diff(f(x),x) 
4xx73 


(On SageMath Cell, you need to define f each time, remember.) The first one is the 
derivative function f', the second is f(x). Higher derivatives are easy too. Here’s 
the second derivative: 


sage: diff(f,x,2) 

x |--> 12*x72 

sage: diff (f(x) ,x,2) 
12*x72 


The command diff (f ,x,x) is equivalent to diff (f,x,2).SageMath also uses (in 
fact, prefers) the object-oriented convention: 


sage: £.diff(x) 


x |--> 4*x73 

sage: f(x) .diff(x) 
4*x73 

sage: f(x) .diff(x,4) 
24 


One advantage of this syntax is that either in the terminal window or in a notebook 
you can write f. and then hit the tab key. SageMath will then open a pop-up window 
telling you the many things you can do to the function f. 

Here’s how to integrate: 


sage: f(x) .integrate(x) 
1/5*x75 

sage: 

sage: f.integrate(x) 

x |--> 1/5*x75 

sage: f(x) .integrate(x) 
1/5*x75 

sage: integrate (f(x) ,x) 
1/5*x75 

sage: integrate(f(x) ,x,2,3) 
211/5 

sage: f(x) .integrate(x,2,3) 
211/5 


The last one is the definite integral 


3 
: St (x) dx. 
2, 
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Let’s try a hard one: 


sage: g(x)=integrate (sqrt (x) *sqrt (1+x) ,x) 

sage: print (g) 

x |--> 1/4*((x + 1)7 (3/2) /x* (3/2) 
+ sqrt(x + 1)/sqrt(x))/((x + 1)72/x72 
- Qe(x + 1)/x + 1) - 1/8*log(sqrt(x + 1)/sqrt(x) + 1) 
+ 1/8*log(sqrt(x + 1)/sqrt(x) - 1) 


The first command produces no output, so I used print to tell SageMath to show me 
the result. The big mess is printed out as a single line, but I have added line breaks 
here. 

Notice that that answer uses the function log. That means the natural logarithm; 
SageMath does understand what 1n means, but it defaults to using log, which is what 
most mathematicians do. Indeed, 


sage: 1n(2) 

log (2) 

sage: 1n(2).nQ 
0.693147180559945 


SageMath tries to give you an exact answer when it can, but you can force an approx- 
imate answer in decimal form by using the n command. 

On SageMath Cell or a notebook, you can get better-looking output by using 
show instead of print. For example, in SageMath Cell if we enter 


x = var(’x’) 
g(x)=integrate (sqrt (x)*sqrt(1+x), x) 
show(g(x)) 


and hit evaluate, we get something like: 


(41)? =: Vxt+l 
3 vx 1 Ge ) 1 (Se ) 
log +1)+ —1 


—  — - = — log 
2 
4 (Sop - etd 41) 8 J/x /x 


8 
There is also latex (f (x) ), which is what I did to get the IATRX code to typeset the 
result. (If you use show in the terminal, you get the IATRX code.) 

While show produces output that is nicer to look at, the output of print is easier 
to cut and paste. In general, it’s best to ask SageMath to either print or show the 
outputs you want to see. 

Maybe there’s a way to simplify that monster? One way is to try 


sage: g(x).simplify_full() 
1/4*(2*x + 1)*sqrt(x + 1)*sqrt(x) - 1/8*log((sqrt(x + 1) 
+ sqrt(x))/sqrt(x)) + 1/8*log((sqrt(x + 1) - sqrt(x))/sqrt(x)) 


134 A Short Book on Long Sums 


(I have added a line break.) 
If we ask SageMath to show that, we get 


ene) Sa 


I guess that is a little better. There are various different versions of simplify: 


.simplify( 
.simplify_full() 
.simplify_log() 
.simplify_trig() 
.simplify_rational () 


Hh Fh Fh Fh Fh 


The first one doesn’t usually help; the others can help depending on the situation. 
See the manual for more details. 

Sometimes you also want to use factor () and expand () to get formulas to look 
right. Occasionally, simplify and others will tell you that they need more informa- 
tion (e.g., “is x > 07”). In that case, you get a long error message that tells you what 
to do at the end (most SageMath error messages give their most useful information at 
the bottom). You can use assume (x>0) or similar commands to provide the needed 
information. 

For example, if you wanted SageMath to simplify log( /x) into 5 log(x), you 
would need assume (x>0). You might also need to use simplify_log. That might 
help make the formula for g(x) above a little less intimidating. 


A.3 Plotting 


SageMath can also plot functions. For example 
sage: plot (f(x), (-2,2)) 


will display a plot of our function. (In a terminal window, SageMath will use your 
computer’s default graphics program to show you the result.) Plotting functions have 
a zillion options; type ?plot for documentation and examples. I particularly enjoy 
playing with the various options related to color. 

You can also give your plot a name and then do things to it. 


sage: P1l=plot (f(x), (-2,2) ,color="blue") 

sage: P2=plot (diff (f(x) ,x),(-2,2) ,color="red") 
sage: P=P1+P2 

sage: show(P) 


Produces this: 
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307 


207 


There are lots of other plotting commands. To plot a curve given by an equation 
like y? = x? — x, do this in the cell server: 


var(’?x y’) 
implicit_plot (y~2==x73-x, (x,-2,2), (y,-2,2)) 


(Note the double equals sign!) The graph looks like this: 


T T 
-2.0 -1.5 -—-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 
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How did I get the plots into this 4TRX file? I used 
sage: P.save(’example.eps’} 


to create an encapsulated postscript (eps) file, which I then included into my file. The 
save command will use the format indicated by your file name, so you can choose 
the graphics format of your choice. 

Sage will make three-dimensional graphs as well. For example, let’s look at the 
equation in three variables yz = x? —x. If we plot the points (x, y, z) that satisfy that 
equation, we’ll get a surface in three-dimensional space. Here’s how to get SageMath 
to draw it for you. 


var(?x y z’) 
implicit_plot3d (y~2*z==x73-x, (x,-2,2),(y,-2,2), (z,-2,2)) 


This gives a rotatable graph of the surface defined by that equation. The graph dis- 
plays in a browser window; if you are using the terminal, it will open a new browser 
window. It looks something like this: 


-2.00 
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A.4 Sums 


Since this is a book about sums, it’s good to know how to handle them in Sage- 
Math. Of course, you can just write them out. 


sage :1+1/2+1/3+1/4+1/5 
137/60 


That becomes hard to do when you have lots of terms, however. So, let’s declare a 
variable k and use it to describe the general term of the sum instead: 


sage: var(’k’) 

k 

sage: sum(1/k,k,1,5) 
137/60 


(I won’t repeat the definition of k again; if you are working with SageMath Cell, it 
should be done each time.) 
Now we can do a larger sum just as easily: 


sage: sum(1/k,k,1,20) 
55835135/1551950 


SageMath always prefers to give the exact answer, which in this case is a fraction. 
But we usually want the decimal approximation instead. The easiest way to do this 
is to make SageMath work with decimals throughout, by replacing | with 1.0: 


sage: sum(1.0/k,k,1,20) 
3.597739657 143682 


You can even try an infinite sum: 


sage: sum(1/k72,k,1,00) 
1/6*pi72 


But that doesn’t always work. It certainly fails when the sum does not converge. 


B Why I Do It This Way 


The theory of infinite series is a beautiful part of analysis and it is used throughout 
mathematics. Series are used a lot in applied mathematics, physics, and engineering. 
Nevertheless, I have heard from many mathematics majors (at Colby and elsewhere) 
that this part of their calculus courses had made no sense to them. Worse, they said 
that they did not remember any of it. That convinced me that another approach was 
needed. Whether they will remember this one, I don’t really know, but I hope at least 
it works a little better.! I also hope that it will lead to lasting understanding that will 
persist beyond the final exam. 

The usual approach is the logical one: before we can talk of series of functions, 
we need to consider numerical sequences, define convergence, define series, and so 
on. lam convinced, however, that the logical order is not necessarily the pedagogical 
order. So, things got rearranged and some topics were jettisoned. 

It seems to me, first of all, that if we start with a numerical series we are essen- 
tially answering a question the students do not (yet) want to ask. Why would one 
want to add infinitely many numbers in the first place? It also seems foreign to what 
they have learned so far. Isn’t calculus supposed to be about functions? Where are 
the derivatives and integrals we were learning about so far? 

Another annoyance with the traditional sequence are the abundant “does it con- 
verge?” problems. Students often don’t see the point of saying “it converges” without 
finding what it converges to. And they are right! Knowing that a series converges is 
not an answer; rather it’s the basis for a question: what is the sum? Euler spent a long 
time thinking about that question for the zeta function. 

Clearly, I was looking for a narrative, one that started from things that were rec- 
ognizably calculus and led to infinite series. Power series provided such a narrative. 
That gave me an organizing principle: if I don’t need this result in the elementary 
study of power series, then I don’t need this result at all. 

I decided to start from the idea that the derivative provides the best linear approx- 
imation to a (differentiable) function. Since many calculus courses do not emphasize 
this idea as much as I would like, I dedicated an entire chapter to it.? This also serves 
to set up the rules for studying approximations, introducing reference points, incre- 
ments, error terms, etc. I later build on this when I define derivatives in the multi- 
variable context, so it’s valuable to have. 

Trying to improve the linear approximation leads naturally to polynomial approx- 
imations and the main theorem on Taylor Polynomials. The simple examples (expo- 
nential, sine, cosine, and geometric) then lead to questions of convergence. At this 


‘Tt works in my calculus classes, but that may just reflect the fact that I like it. 
2Instructors may be tempted to skip this chapter. My recommendation is not to do this. 
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point, I do enough of the theory of convergence to allow students to find the radius 
of convergence for (simple) power series, and so to define new functions. 

This way of doing things is not easier for the students: the idea of approximation 
remains foreign to most elementary mathematics instruction. Students often ask me 
about the goal. Why are we doing this? Where are we heading? To answer those 
questions I introduced the idea of decimal expansions as analogous to power series 
expansions. Depending on the students, this idea may get more emphasis in class 
than in the notes. 

Throughout the book, I have used “the function f(x)” rather than being careful 
about the distinction between a function and its value. This is the language used in 
most calculus textbooks and it is the language students understand at this point. It 
seemed too heavy-handed to insist on the correct language right in the first paragraph 
of the introduction. Instructors should feel free to point out this “abuse of language” 
to students, of course. One might use SageMath, which distinguishes between f and 
J (x), to make the point more vivid. 

I believe students should understand that theorems depend on assumptions, but 
I’m not sure that students at this level should worry about things like “twice differen- 
tiable” and friends. I have sometimes given precise assumptions, but at other times 
relied on more informal language such as “nice enough.” I’m sure instructors will be 
able to provide more precise assumptions if they feel students can benefit from that. 

In many places, I have given justifications and arguments for why certain things 
are true. Students in my target audience are not ready for formal proofs, but I have 
tried to move them in that direction. My informal rule for informal proofs is that 
what I say should reflect the content of the proof. In other words, I have tried not to 
lie. 

Like many (most?) teachers of mathematics, I have shamelessly borrowed from 
others, both in the exposition and in the problem sets. I am sure that I do not remem- 
ber all my sources. I know I have been influenced by George Welch, Ralph Boas 
(especially the articles on infinite series in [4]), T. W. Ko6rner’s Calculus for the 
Ambitious [13], the first editions of the calculus textbook [12] by Hughes Hallett et. 
al., and Calculus Problems for a New Century [11], ed. by Robert Fraga. I am deeply 
grateful for their insights and ideas. In the Instructor’s Guide, I have provided precise 
credits for the problems whenever I could. 
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